Impact of AI on Freelance Jobs
submitted by /u/valis2400
[link] [comments]
arXiv:2307.09312v4 Announce Type: replace-cross
Abstract: We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor detecting hate speech in online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the discussion surrounding a comment and grounding the interwoven fusion layers that combine text and image embeddings instead of processing modalities separately. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies.
( 2
min )
arXiv:2402.14095v1 Announce Type: cross
Abstract: Generalization to unseen data is a key desideratum for deep networks, but its relation to classification accuracy is unclear. Using a minimalist vision dataset and a measure of generalizability, we show that popular networks, from deep convolutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures. Accuracy is not a good predictor of generalizability, and generalization varies non-monotonically with layer depth. Code is available at https://github.com/dyballa/zero-shot-generalization.
( 2
min )
arXiv:2402.14578v1 Announce Type: cross
Abstract: In this paper, we consider a deterministic online linear regression model where we allow the responses to be multivariate. To address this problem, we introduce MultiVAW, a method that extends the well-known Vovk-Azoury-Warmuth algorithm to the multivariate setting, and show that it also enjoys logarithmic regret in time. We apply our results to the online hierarchical forecasting problem and recover an algorithm from this literature as a special case, allowing us to relax the hypotheses usually made for its analysis.
( 2
min )
arXiv:2402.14031v1 Announce Type: cross
Abstract: This paper presents a novel autoencoder with ordered variance (AEO) in which the loss function is modified with a variance regularization term to enforce order in the latent space. Further, the autoencoder is modified using ResNets, which results in a ResNet AEO (RAEO). The paper also illustrates the effectiveness of AEO and RAEO in extracting nonlinear relationships among input variables in an unsupervised setting.
( 2
min )
arXiv:2402.14759v1 Announce Type: new
Abstract: The purpose of this paper is to look into how central notions in statistical learning theory, such as realisability, generalise under the assumption that train and test distribution are issued from the same credal set, i.e., a convex set of probability distributions. This can be considered as a first step towards a more general treatment of statistical learning under epistemic uncertainty.
( 2
min )
arXiv:2402.14646v1 Announce Type: new
Abstract: This work introduces reduced models based on Continuous Low Rank Adaptation (CoLoRA) that pre-train neural networks for a given partial differential equation and then continuously adapt low-rank weights in time to rapidly predict the evolution of solution fields at new physics parameters and new initial conditions. The adaptation can be either purely data-driven or via an equation-driven variational approach that provides Galerkin-optimal approximations. Because CoLoRA approximates solution fields locally in time, the rank of the weights can be kept small, which means that only few training trajectories are required offline so that CoLoRA is well suited for data-scarce regimes. Predictions with CoLoRA are orders of magnitude faster than with classical methods and their accuracy and parameter efficiency is higher compared to other neural network approaches.
( 2
min )
arXiv:2402.14532v1 Announce Type: new
Abstract: Obtaining heteroscedastic predictive uncertainties from a Bayesian Neural Network (BNN) is vital to many applications. Often, heteroscedastic aleatoric uncertainties are learned as outputs of the BNN in addition to the predictive means, however doing so may necessitate adding more learnable parameters to the network. In this work, we demonstrate that both the heteroscedastic aleatoric and epistemic variance can be embedded into the variances of learned BNN parameters, improving predictive performance for lightweight networks. By complementing this approach with a moment propagation approach to inference, we introduce a relatively simple framework for sampling-free variational inference suitable for lightweight BNNs.
( 2
min )
arXiv:2402.14481v1 Announce Type: new
Abstract: We introduce the concept of Automated Causal Discovery (AutoCD), defined as any system that aims to fully automate the application of causal discovery and causal reasoning methods. AutoCD's goal is to deliver all causal information that an expert human analyst would and answer a user's causal queries. We describe the architecture of such a platform, and illustrate its performance on synthetic data sets. As a case study, we apply it on temporal telecommunication data. The system is general and can be applied to a plethora of causal discovery problems.
( 2
min )
arXiv:2402.14385v1 Announce Type: new
Abstract: Achieving net zero carbon emissions by 2050 requires the integration of increasing amounts of wind power into power grids. This energy source poses a challenge to system operators due to its variability and uncertainty. Therefore, accurate forecasting of wind power is critical for grid operation and system balancing. This paper presents an innovative approach to short-term (1 to 6 hour horizon) windpower forecasting at a national level. The method leverages Automated Deep Learning combined with Numerical Weather Predictions wind speed maps to accurately forecast wind power.
( 2
min )
arXiv:2402.14384v1 Announce Type: new
Abstract: In this paper, we employ a 1D deep convolutional generative adversarial network (DCGAN) for sequential anomaly detection in energy time series data. Anomaly detection involves gradient descent to reconstruct energy sub-sequences, identifying the noise vector that closely generates them through the generator network. Soft-DTW is used as a differentiable alternative for the reconstruction loss and is found to be superior to Euclidean distance. Combining reconstruction loss and the latent space's prior probability distribution serves as the anomaly score. Our novel method accelerates detection by parallel computation of reconstruction of multiple points and shows promise in identifying anomalous energy consumption in buildings, as evidenced by performing experiments on hourly energy time series from 15 buildings.
( 2
min )
arXiv:2402.14080v1 Announce Type: new
Abstract: Deep learning models are being adopted and applied on various critical decision-making tasks, yet they are trained to provide point predictions without providing degrees of confidence. The trustworthiness of deep learning models can be increased if paired with uncertainty estimations. Conformal Prediction has emerged as a promising method to pair machine learning models with prediction intervals, allowing for a view of the model's uncertainty. However, popular uncertainty estimation methods for conformal prediction fail to provide heteroskedastic intervals that are equally accurate for all samples. In this paper, we propose a method to estimate the uncertainty of each sample by calculating the variance obtained from a Deep Regression Forest. We show that the deep regression forest variance improves the efficiency and coverage of normalized inductive conformal prediction on a drug response prediction task.
( 2
min )
arXiv:2402.14385v1 Announce Type: cross
Abstract: Achieving net zero carbon emissions by 2050 requires the integration of increasing amounts of wind power into power grids. This energy source poses a challenge to system operators due to its variability and uncertainty. Therefore, accurate forecasting of wind power is critical for grid operation and system balancing. This paper presents an innovative approach to short-term (1 to 6 hour horizon) windpower forecasting at a national level. The method leverages Automated Deep Learning combined with Numerical Weather Predictions wind speed maps to accurately forecast wind power.
( 2
min )
arXiv:2402.14080v1 Announce Type: cross
Abstract: Deep learning models are being adopted and applied on various critical decision-making tasks, yet they are trained to provide point predictions without providing degrees of confidence. The trustworthiness of deep learning models can be increased if paired with uncertainty estimations. Conformal Prediction has emerged as a promising method to pair machine learning models with prediction intervals, allowing for a view of the model's uncertainty. However, popular uncertainty estimation methods for conformal prediction fail to provide heteroskedastic intervals that are equally accurate for all samples. In this paper, we propose a method to estimate the uncertainty of each sample by calculating the variance obtained from a Deep Regression Forest. We show that the deep regression forest variance improves the efficiency and coverage of normalized inductive conformal prediction on a drug response prediction task.
( 2
min )
arXiv:2402.14646v1 Announce Type: cross
Abstract: This work introduces reduced models based on Continuous Low Rank Adaptation (CoLoRA) that pre-train neural networks for a given partial differential equation and then continuously adapt low-rank weights in time to rapidly predict the evolution of solution fields at new physics parameters and new initial conditions. The adaptation can be either purely data-driven or via an equation-driven variational approach that provides Galerkin-optimal approximations. Because CoLoRA approximates solution fields locally in time, the rank of the weights can be kept small, which means that only few training trajectories are required offline so that CoLoRA is well suited for data-scarce regimes. Predictions with CoLoRA are orders of magnitude faster than with classical methods and their accuracy and parameter efficiency is higher compared to other neural network approaches.
( 2
min )
arXiv:2402.14532v1 Announce Type: cross
Abstract: Obtaining heteroscedastic predictive uncertainties from a Bayesian Neural Network (BNN) is vital to many applications. Often, heteroscedastic aleatoric uncertainties are learned as outputs of the BNN in addition to the predictive means, however doing so may necessitate adding more learnable parameters to the network. In this work, we demonstrate that both the heteroscedastic aleatoric and epistemic variance can be embedded into the variances of learned BNN parameters, improving predictive performance for lightweight networks. By complementing this approach with a moment propagation approach to inference, we introduce a relatively simple framework for sampling-free variational inference suitable for lightweight BNNs.
( 2
min )
arXiv:2402.14578v1 Announce Type: new
Abstract: In this paper, we consider a deterministic online linear regression model where we allow the responses to be multivariate. To address this problem, we introduce MultiVAW, a method that extends the well-known Vovk-Azoury-Warmuth algorithm to the multivariate setting, and show that it also enjoys logarithmic regret in time. We apply our results to the online hierarchical forecasting problem and recover an algorithm from this literature as a special case, allowing us to relax the hypotheses usually made for its analysis.
( 2
min )
arXiv:2402.13353v1 Announce Type: cross
Abstract: Detecting and analyzing various defect types in semiconductor materials is an important prerequisite for understanding the underlying mechanisms as well as tailoring the production processes. Analysis of microscopy images that reveal defects typically requires image analysis tasks such as segmentation and object detection. With the permanently increasing amount of data that is produced by experiments, handling these tasks manually becomes more and more impossible. In this work, we combine various image analysis and data mining techniques for creating a robust and accurate, automated image analysis pipeline. This allows for extracting the type and position of all defects in a microscopy image of a KOH-etched 4H-SiC wafer that was stitched together from approximately 40,000 individual images.
( 2
min )
arXiv:2402.13929v1 Announce Type: cross
Abstract: We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our distilled SDXL-Lightning models both as LoRA and full UNet weights.
( 2
min )
arXiv:2402.13654v1 Announce Type: cross
Abstract: This paper presents a learning-based control strategy for non-linear throttle valves with an asymmetric hysteresis, leading to a near-optimal controller without requiring any prior knowledge about the environment. We start with a carefully tuned Proportional Integrator (PI) controller and exploit the recent advances in Reinforcement Learning (RL) with Guides to improve the closed-loop behavior by learning from the additional interactions with the valve. We test the proposed control method in various scenarios on three different valves, all highlighting the benefits of combining both PI and RL frameworks to improve control performance in non-linear stochastic systems. In all the experimental test cases, the resulting agent has a better sample efficiency than traditional RL agents and outperforms the PI controller.
( 2
min )
arXiv:2402.13613v1 Announce Type: cross
Abstract: This paper presents a comprehensive overview of the Comparative Opinion Mining from Vietnamese Product Reviews shared task (ComOM), held as part of the 10$^{th}$ International Workshop on Vietnamese Language and Speech Processing (VLSP 2023). The primary objective of this shared task is to advance the field of natural language processing by developing techniques that proficiently extract comparative opinions from Vietnamese product reviews. Participants are challenged to propose models that adeptly extract a comparative "quintuple" from a comparative sentence, encompassing Subject, Object, Aspect, Predicate, and Comparison Type Label. We construct a human-annotated dataset comprising $120$ documents, encompassing $7427$ non-comparative sentences and $2468$ comparisons within $1798$ sentences. Participating models undergo evaluation and ranking based on the Exact match macro-averaged quintuple F1 score.
( 2
min )
arXiv:2402.13608v1 Announce Type: cross
Abstract: This study proposes a trainable sampling-based solver for combinatorial optimization problems (COPs) using a deep-learning technique called deep unfolding. The proposed solver is based on the Ohzeki method that combines Markov-chain Monte-Carlo (MCMC) and gradient descent, and its step sizes are trained by minimizing a loss function. In the training process, we propose a sampling-based gradient estimation that substitutes auto-differentiation with a variance estimation, thereby circumventing the failure of back propagation due to the non-differentiability of MCMC. The numerical results for a few COPs demonstrated that the proposed solver significantly accelerated the convergence speed compared with the original Ohzeki method.
( 2
min )
arXiv:2402.13528v1 Announce Type: cross
Abstract: Current research concentrates on studying discussions on social media related to structural failures to improve disaster response strategies. However, detecting social web posts discussing concerns about anticipatory failures is under-explored. If such concerns are channeled to the appropriate authorities, it can aid in the prevention and mitigation of potential infrastructural failures. In this paper, we develop an infrastructure ombudsman -- that automatically detects specific infrastructure concerns. Our work considers several recent structural failures in the US. We present a first-of-its-kind dataset of 2,662 social web instances for this novel task mined from Reddit and YouTube.
( 2
min )
arXiv:2402.13285v1 Announce Type: cross
Abstract: In statistical learning theory, a generalization bound usually involves a complexity measure imposed by the considered theoretical framework. This limits the scope of such bounds, as other forms of capacity measures or regularizations are used in algorithms. In this paper, we leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures. One trick to prove such a result involves considering a commonly used family of distributions: the Gibbs distributions. Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap as it can be customized to fit both the hypothesis class and the task.
( 2
min )
arXiv:2402.13852v1 Announce Type: new
Abstract: Precise glucose level management is pivotal for individuals with diabetes, averting severe complications. In this work, we introduce a novel neural control system for continuous glucose monitoring and maintenance, utilizing differential predictive control. Our system, guided by a sophisticated neural policy and differentiable modeling, dynamically adjusts insulin delivery in real-time, enhancing glucose optimization. This end-to-end approach maximizes efficiency, ensuring personalized care and improved health outcomes, as affirmed by empirical findings.
( 2
min )
arXiv:2402.13531v1 Announce Type: new
Abstract: We provide an improved analysis of standard differentially private gradient descent for linear regression under the squared error loss. Under modest assumptions on the input, we characterize the distribution of the iterate at each time step.
Our analysis leads to new results on the algorithm's accuracy: for a proper fixed choice of hyperparameters, the sample complexity depends only linearly on the dimension of the data. This matches the dimension-dependence of the (non-private) ordinary least squares estimator as well as that of recent private algorithms that rely on sophisticated adaptive gradient-clipping schemes (Varshney et al., 2022; Liu et al., 2023).
Our analysis of the iterates' distribution also allows us to construct confidence intervals for the empirical optimizer which adapt automatically to the variance of the algorithm on a particular data set. We validate our theorems through experiments on synthetic data.
( 2
min )
arXiv:2402.13525v1 Announce Type: new
Abstract: Recent years have seen the explosion of edge intelligence with powerful Deep Neural Networks (DNNs). One popular scheme is training DNNs on powerful cloud servers and subsequently porting them to mobile devices after being lightweight. Conventional approaches manually specialized DNNs for various edge platforms and retrain them with real-world data. However, as the number of platforms increases, these approaches become labour-intensive and computationally prohibitive. Additionally, real-world data tends to be sparse-label, further increasing the difficulty of lightweight models. In this paper, we propose MatchNAS, a novel scheme for porting DNNs to mobile devices. Specifically, we simultaneously optimise a large network family using both labelled and unlabelled data and then automatically search for tailored networks for different hardware platforms. MatchNAS acts as an intermediary that bridges the gap between cloud-based DNNs and edge-based DNNs.
( 2
min )
arXiv:2401.15567v2 Announce Type: replace-cross
Abstract: We present new concentration inequalities for either martingale dependent or exchangeable random symmetric matrices under a variety of tail conditions, encompassing now-standard Chernoff bounds to self-normalized heavy-tailed settings. These inequalities are often randomized in a way that renders them strictly tighter than existing deterministic results in the literature, are typically expressed in the Loewner order, and are sometimes valid at arbitrary data-dependent stopping times. Along the way, we explore the theory of positive semidefinite supermartingales and maximal inequalities, a natural matrix analog of scalar nonnegative supermartingales that is potentially of independent interest.
( 2
min )
arXiv:2402.13608v1 Announce Type: cross
Abstract: This study proposes a trainable sampling-based solver for combinatorial optimization problems (COPs) using a deep-learning technique called deep unfolding. The proposed solver is based on the Ohzeki method that combines Markov-chain Monte-Carlo (MCMC) and gradient descent, and its step sizes are trained by minimizing a loss function. In the training process, we propose a sampling-based gradient estimation that substitutes auto-differentiation with a variance estimation, thereby circumventing the failure of back propagation due to the non-differentiability of MCMC. The numerical results for a few COPs demonstrated that the proposed solver significantly accelerated the convergence speed compared with the original Ohzeki method.
( 2
min )
arXiv:2402.13285v1 Announce Type: new
Abstract: In statistical learning theory, a generalization bound usually involves a complexity measure imposed by the considered theoretical framework. This limits the scope of such bounds, as other forms of capacity measures or regularizations are used in algorithms. In this paper, we leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures. One trick to prove such a result involves considering a commonly used family of distributions: the Gibbs distributions. Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap as it can be customized to fit both the hypothesis class and the task.
( 2
min )
arXiv:2402.13852v1 Announce Type: cross
Abstract: Precise glucose level management is pivotal for individuals with diabetes, averting severe complications. In this work, we introduce a novel neural control system for continuous glucose monitoring and maintenance, utilizing differential predictive control. Our system, guided by a sophisticated neural policy and differentiable modeling, dynamically adjusts insulin delivery in real-time, enhancing glucose optimization. This end-to-end approach maximizes efficiency, ensuring personalized care and improved health outcomes, as affirmed by empirical findings.
( 2
min )
The February NVIDIA Studio Driver, designed specifically to optimize creative apps, is now available for download.
( 7
min )
Editor’s note: This post is part of Into the Omniverse, a series focused on how artists, developers and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse. The combination of powerful 3D tools and groundbreaking technologies can transform the way designers bring their visions to life — and Universal Scene
Read Article
( 7
min )
Top-tier games from publishing partners Bandai Namco Entertainment and Inflexion Games are joining GeForce NOW this week as the cloud streaming service’s fourth-anniversary celebrations continue. Eleven new titles join the over 1,800 supported games in the GeForce NOW library, including Nightingale from Inflexion Games and Bandai Namco Entertainment’s Tales of Arise, Katamari Damacy REROLL and
Read Article
( 7
min )
“I just got back from GTC and ….” In four weeks, those will be among the most powerful words in your industry. But you won’t be able to use them if you haven’t been here. NVIDIA’s GTC 2024 transforms the San Jose Convention Center into a crucible of innovation, learning and community from March 18-21,
Read Article
( 6
min )
arXiv:2402.12558v1 Announce Type: new
Abstract: COVID-19 disease has affected almost every country in the world. The large number of infected people and the different mortality rates between countries has given rise to many hypotheses about the key points that make the virus so lethal in some places. In this study, the eating habits of 170 countries were evaluated in order to find correlations between these habits and mortality rates caused by COVID-19 using machine learning techniques that group the countries together according to the different distribution of fat, energy, and protein across 23 different types of food, as well as the amount ingested in kilograms. Results shown how obesity and the high consumption of fats appear in countries with the highest death rates, whereas countries with a lower rate have a higher level of cereal consumption accompanied by a lower total average intake of kilocalories.
( 3
min )
arXiv:2402.12939v1 Announce Type: new
Abstract: Understanding the behavior of deep reinforcement learning (DRL) agents is crucial for improving their performance and reliability. However, the complexity of their policies often makes them challenging to understand. In this paper, we introduce a new approach for investigating the behavior modes of DRL policies, which involves utilizing dimensionality reduction and trajectory clustering in the latent space of neural networks. Specifically, we use Pairwise Controlled Manifold Approximation Projection (PaCMAP) for dimensionality reduction and TRACLUS for trajectory clustering to analyze the latent space of a DRL policy trained on the Mountain Car control task. Our methodology helps identify diverse behavior patterns and suboptimal choices by the policy, thus allowing for targeted improvements. We demonstrate how our approach, combined with domain knowledge, can enhance a policy's performance in specific regions of the state space.
( 2
min )
arXiv:2302.02181v2 Announce Type: replace-cross
Abstract: In this work, we propose a fast and accurate method to reconstruct activations of classification and semantic segmentation networks by stitching them with a GAN generator utilizing a 1x1 convolution. We test our approach on images of animals from the AFHQ wild dataset, ImageNet1K, and real-world digital pathology scans of stained tissue samples. Our results show comparable performance to established gradient descent methods but with a processing time that is two orders of magnitude faster, making this approach promising for practical applications.
( 2
min )
arXiv:2206.02911v2 Announce Type: replace
Abstract: A general setup for deterministic system identification problems on graphs with Dirichlet and Neumann boundary conditions is introduced. When control nodes are available along the boundary, we apply a discretize-then-optimize method to estimate an optimal control. A key piece in the present architecture is our boundary injected message passing neural network. This will produce more accurate predictions that are considerably more stable in proximity of the boundary. Also, a regularization technique based on graphical distance is introduced that helps with stabilizing the predictions at nodes far from the boundary.
( 2
min )
arXiv:2402.13001v1 Announce Type: cross
Abstract: Graph states are used to represent mathematical graphs as quantum states on quantum computers. They can be formulated through stabilizer codes or directly quantum gates and quantum states. In this paper we show that a quantum graph neural network model can be understood and realized based on graph states. We show that they can be used either as a parameterized quantum circuits to represent neural networks or as an underlying structure to construct graph neural networks on quantum computers.
( 2
min )
arXiv:2402.12890v1 Announce Type: cross
Abstract: This paper explores an empirical approach to learn more discriminantive sentence representations in an unsupervised fashion. Leveraging semantic graph smoothing, we enhance sentence embeddings obtained from pretrained models to improve results for the text clustering and classification tasks. Our method, validated on eight benchmarks, demonstrates consistent improvements, showcasing the potential of semantic graph smoothing in improving sentence embeddings for the supervised and unsupervised document categorization tasks.
( 2
min )
arXiv:2402.12617v1 Announce Type: cross
Abstract: Generative AI's expanding footprint across numerous industries has led to both excitement and increased scrutiny. This paper delves into the unique security challenges posed by Generative AI, and outlines potential research directions for managing these risks.
( 2
min )
arXiv:2402.12479v1 Announce Type: new
Abstract: Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional networks and exhibit a type of "scaling law", using only a small fraction of the full network parameters.
( 2
min )
arXiv:2402.12424v1 Announce Type: new
Abstract: In this paper, we investigate the effectiveness of various LLMs in interpreting tabular data through different prompting strategies and data formats. Our analysis extends across six benchmarks for table-related tasks such as question-answering and fact-checking. We introduce for the first time the assessment of LLMs' performance on image-based table representations. Specifically, we compare five text-based and three image-based table representations, demonstrating the influence of representation and prompting on LLM performance. Our study provides insights into the effective use of LLMs on table-related tasks.
( 2
min )
NVIDIA, in collaboration with Google, today launched optimizations across all NVIDIA AI platforms for Gemma — Google’s state-of-the-art new lightweight 2 billion– and 7 billion-parameter open language models that can be run anywhere, reducing costs and speeding innovative work for domain-specific use cases. Teams from the companies worked closely together to accelerate the performance of
Read Article
( 5
min )
arXiv:2311.04256v3 Announce Type: replace-cross
Abstract: Hesitant fuzzy sets are widely used in certain instances of uncertainty and hesitation. In sets, the inclusion relationship is an important and foundational definition. Thus, as a kind of set, hesitant fuzzy sets require an explicit definition of inclusion relationship. Based on the hesitant fuzzy membership degree of discrete form, several kinds of inclusion relationships for hesitant fuzzy sets are proposed in this work. Then, some foundational propositions of hesitant fuzzy sets are presented, along with propositions of families of hesitant fuzzy sets. Some foundational propositions of hesitant fuzzy information systems are proposed with respect to parameter reductions and an example and an algorithm are given to illustrate the processes of parameter reduction. Finally, a multi-strength intelligent classifier is proposed to make health state diagnoses for complex systems.
( 2
min )
arXiv:2402.10983v1 Announce Type: new
Abstract: Neural networks demonstrate inherent vulnerability to small, non-random perturbations, emerging as adversarial attacks. Such attacks, born from the gradient of the loss function relative to the input, are discerned as input conjugates, revealing a systemic fragility within the network structure. Intriguingly, a mathematical congruence manifests between this mechanism and the quantum physics' uncertainty principle, casting light on a hitherto unanticipated interdisciplinarity. This inherent susceptibility within neural network systems is generally intrinsic, highlighting not only the innate vulnerability of these networks but also suggesting potential advancements in the interdisciplinary area for understanding these black-box networks.
( 2
min )
arXiv:2308.05724v2 Announce Type: replace
Abstract: Deep learning training training algorithms are a huge success in recent years in many fields including speech, text,image video etc. Deeper and deeper layers are proposed with huge success with resnet structures having around 152 layers. Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained. Activation functions used in the network are of utmost importance, as they provide non linearity to the networks. Relu's are the most commonly used activation function.We show a complex piece-wise linear(PWL) activation in the hidden layer. We show that these PWL activations work much better than relu activations in our networks for convolution neural networks and multilayer perceptrons. Result comparison in PyTorch for shallow and deep CNNs are given to further strengthen our case.
( 2
min )
arXiv:2307.05209v3 Announce Type: replace-cross
Abstract: Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.
( 2
min )
arXiv:2307.01649v2 Announce Type: replace
Abstract: Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.
( 2
min )
arXiv:2402.12271v1 Announce Type: cross
Abstract: Federated learning enables multiple data owners to collaboratively train robust machine learning models without transferring large or sensitive local datasets by only sharing the parameters of the locally trained models. In this paper, we elaborate on the design of our Advanced Privacy-Preserving Federated Learning (APPFL) framework, which streamlines end-to-end secure and reliable federated learning experiments across cloud computing facilities and high-performance computing resources by leveraging Globus Compute, a distributed function as a service platform, and Amazon Web Services. We further demonstrate the use case of APPFL in fine-tuning a LLaMA 2 7B model using several cloud resources and supercomputers.
( 2
min )
arXiv:2402.12072v1 Announce Type: cross
Abstract: This paper attempts to provide an overview of current approaches for solving inverse problems in imaging using variational methods and machine learning. A special focus lies on point estimators and their robustness against adversarial perturbations. In this context results of numerical experiments for a one-dimensional toy problem are provided, showing the robustness of different approaches and empirically verifying theoretical guarantees. Another focus of this review is the exploration of the subspace of data consistent solutions through explicit guidance to satisfy specific semantic or textural properties.
( 2
min )
arXiv:2402.11997v1 Announce Type: cross
Abstract: Large Language Models (LLMs) are increasingly becoming ubiquitous, yet their ability to reason about and retain temporal information remains limited. This hinders their application in real-world scenarios where understanding the sequential nature of events is crucial. This paper experiments with state-of-the-art models on a novel, large-scale temporal dataset, \textbf{TempUN}, to reveal significant limitations in temporal retention and reasoning abilities. Interestingly, closed-source models indicate knowledge gaps more frequently, potentially suggesting a trade-off between uncertainty awareness and incorrect responses. Further, exploring various fine-tuning approaches yielded no major performance improvements. The associated dataset and code are available at the following URL (https://github.com/lingoiitgn/TempUN).
( 2
min )
arXiv:2402.11985v1 Announce Type: cross
Abstract: Weakly supervised object detection (WSup-OD) increases the usefulness and interpretability of image classification algorithms without requiring additional supervision. The successes of multiple instance learning in this task for natural images, however, do not translate well to medical images due to the very different characteristics of their objects (i.e. pathologies). In this work, we propose Weakly Supervised ROI Proposal Networks (WSRPN), a new method for generating bounding box proposals on the fly using a specialized region of interest-attention (ROI-attention) module. WSRPN integrates well with classic backbone-head classification algorithms and is end-to-end trainable with only image-label supervision. We experimentally demonstrate that our new method outperforms existing methods in the challenging task of disease localization in chest X-ray images. Code: https://github.com/philip-mueller/wsrpn
( 2
min )
arXiv:2402.11809v1 Announce Type: cross
Abstract: This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables autoregressive LLMs to parallelize token generation and verification. This is realized through a specialized semi-autoregressive supervised fine-tuning process that equips existing LLMs with the ability to simultaneously predict multiple tokens. Additionally, an auto-correct decoding algorithm facilitates the simultaneous generation and verification of token sequences within a single model invocation. Through extensive experiments on a range of LLMs, SPACE has demonstrated inference speedup ranging from 2.7x-4.0x on HumanEval-X while maintaining output quality.
( 2
min )
arXiv:2402.11728v1 Announce Type: cross
Abstract: In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function, outperforming existing approaches. Furthermore, we demonstrate the practical utility of our proposed model by constructing a novel measure ``optimism". Furthermore, we observed the dependence of earnings surprise and return on our optimism measure. Our dataset, models, and code will be made publicly (under CC BY 4.0 license) available on GitHub and Hugging Face.
( 2
min )
arXiv:2402.11670v1 Announce Type: cross
Abstract: In this study, we explore the explainability of neural networks in agriculture and forestry, specifically in fertilizer treatment classification and wood identification. The opaque nature of these models, often considered 'black boxes', is addressed through an extensive evaluation of state-of-the-art Attribution Maps (AMs), also known as class activation maps (CAMs) or saliency maps. Our comprehensive qualitative and quantitative analysis of these AMs uncovers critical practical limitations. Findings reveal that AMs frequently fail to consistently highlight crucial features and often misalign with the features considered important by domain experts. These discrepancies raise substantial questions about the utility of AMs in understanding the decision-making process of neural networks. Our study provides critical insights into the trustworthiness and practicality of AMs within the agriculture and forestry sectors, thus facilitating a better understanding of neural networks in these application areas.
( 2
min )
arXiv:2402.11485v1 Announce Type: cross
Abstract: Adapting English-based large language models (LLMs) to other languages has become increasingly popular due to the efficiency and potential of cross-lingual transfer. However, existing language adaptation methods often overlook the benefits of cross-lingual supervision. In this study, we introduce LEIA, a language adaptation tuning method that utilizes Wikipedia entity names aligned across languages. This method involves augmenting the target language corpus with English entity names and training the model using left-to-right language modeling. We assess LEIA on diverse question answering datasets using 7B-parameter LLMs, demonstrating significant performance gains across various non-English languages. The source code is available at https://github.com/studio-ousia/leia.
( 2
min )
arXiv:2308.08925v3 Announce Type: cross
Abstract: In this paper, we tackle the challenge of white-box false positive adversarial attacks on contrastive loss based offline handwritten signature verification models. We propose a novel attack method that treats the attack as a style transfer between closely related but distinct writing styles. To guide the generation of deceptive images, we introduce two new loss functions that enhance the attack success rate by perturbing the Euclidean distance between the embedding vectors of the original and synthesized samples, while ensuring minimal perturbations by reducing the difference between the generated image and the original image. Our method demonstrates state-of-the-art performance in white-box attacks on contrastive loss based offline handwritten signature verification models, as evidenced by our experiments. The key contributions of this paper include a novel false positive attack method, two new loss functions, effective style transfer in handwriting styles, and superior performance in white-box false positive attacks compared to other white-box attack methods.
( 3
min )
arXiv:2402.12269v1 Announce Type: new
Abstract: We present a novel end-to-end deep learning-based approach for Supervised Graph Prediction (SGP). We introduce an original Optimal Transport (OT)-based loss, the Partially-Masked Fused Gromov-Wasserstein loss (PM-FGW), that allows to directly leverage graph representations such as adjacency and feature matrices. PM-FGW exhibits all the desirable properties for SGP: it is node permutation invariant, sub-differentiable and handles graphs of different sizes by comparing their padded representations as well as their masking vectors. Moreover, we present a flexible transformer-based architecture that easily adapts to different types of input data. In the experimental section, three different tasks, a novel and challenging synthetic dataset (image2graph) and two real-world tasks, image2map and fingerprint2molecule - showcase the efficiency and versatility of the approach compared to competitors.
( 2
min )
arXiv:2402.12231v1 Announce Type: new
Abstract: Ordinary differential equations (ODEs) are widely used to describe dynamical systems in science, but identifying parameters that explain experimental measurements is challenging. In particular, although ODEs are differentiable and would allow for gradient-based parameter optimization, the nonlinear dynamics of ODEs often lead to many local minima and extreme sensitivity to initial conditions. We therefore propose diffusion tempering, a novel regularization technique for probabilistic numerical methods which improves convergence of gradient-based parameter optimization in ODEs. By iteratively reducing a noise parameter of the probabilistic integrator, the proposed method converges more reliably to the true parameters. We demonstrate that our method is effective for dynamical systems of different complexity and show that it obtains reliable parameter estimates for a Hodgkin-Huxley model with a practically relevant number of parameters.
( 2
min )
arXiv:2402.12067v1 Announce Type: new
Abstract: Visual navigation requires a whole range of capabilities. A crucial one of these is the ability of an agent to determine its own location and heading in an environment. Prior works commonly assume this information as given, or use methods which lack a suitable inductive bias and accumulate error over time. In this work, we show how the method of slow feature analysis (SFA), inspired by neuroscience research, overcomes both limitations by generating interpretable representations of visual data that encode location and heading of an agent. We employ SFA in a modern reinforcement learning context, analyse and compare representations and illustrate where hierarchical SFA can outperform other feature extractors on navigation tasks.
( 2
min )
arXiv:2402.11942v1 Announce Type: new
Abstract: We investigate the training and generalization errors of overparameterized neural networks (NNs) with a wide class of leaky rectified linear unit (ReLU) functions. More specifically, we carefully upper bound both the convergence rate of the training error and the generalization error of such NNs and investigate the dependence of these bounds on the Leaky ReLU parameter, $\alpha$. We show that $\alpha =-1$, which corresponds to the absolute value activation function, is optimal for the training error bound. Furthermore, in special settings, it is also optimal for the generalization error bound. Numerical experiments empirically support the practical choices guided by the theory.
( 2
min )
arXiv:2402.11877v1 Announce Type: new
Abstract: Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of $Q$-learning to a model-based framework remains relatively unexplored. In this paper, we delve into the sample complexity of $Q$-learning when integrated with a model-based approach. Through theoretical analyses and empirical evaluations, we seek to elucidate the conditions under which model-based $Q$-learning excels in terms of sample efficiency compared to its model-free counterpart.
( 2
min )
ZOO Digital provides end-to-end localization and media services to adapt original TV and movie content to different languages, regions, and cultures. It makes globalization easier for the world’s best content creators. Trusted by the biggest names in entertainment, ZOO Digital delivers high-quality localization and media services at scale, including dubbing, subtitling, scripting, and compliance. Typical […]
( 11
min )
Using a machine-learning algorithm, researchers can predict interactions that could interfere with a drug’s effectiveness.
( 6
min )
arXiv:2402.10248v1 Announce Type: new
Abstract: Global ambient air pollution, a transboundary challenge, is typically addressed through interventions relying on data from spatially sparse and heterogeneously placed monitoring stations. These stations often encounter temporal data gaps due to issues such as power outages. In response, we have developed a scalable, data-driven, supervised machine learning framework. This model is designed to impute missing temporal and spatial measurements, thereby generating a comprehensive dataset for pollutants including NO$_2$, O$_3$, PM$_{10}$, PM$_{2.5}$, and SO$_2$. The dataset, with a fine granularity of 0.25$^{\circ}$ at hourly intervals and accompanied by prediction intervals for each estimate, caters to a wide range of stakeholders relying on outdoor air pollution data for downstream assessments. This enables more detailed studies. Additionally, the model's performance across various geographical locations is examined, providing insights and recommendations for strategic placement of future monitoring stations to further enhance the model's accuracy.
( 2
min )
arXiv:2312.06528v4 Announce Type: replace
Abstract: Many neural network architectures are known to be Turing Complete, and can thus, in principle implement arbitrary algorithms. However, Transformers are unique in that they can implement gradient-based learning algorithms under simple parameter configurations. This paper provides theoretical and empirical evidence that (non-linear) Transformers naturally learn to implement gradient descent in function space, which in turn enable them to learn non-linear functions in context. Our results apply to a broad class of combinations of non-linear architectures and non-linear in-context learning tasks. Additionally, we show that the optimal choice of non-linear activation depends in a natural way on the class of functions that need to be learned.
( 2
min )
arXiv:2310.06549v2 Announce Type: replace
Abstract: Label smoothing -- using softened labels instead of hard ones -- is a widely adopted regularization method for deep learning, showing diverse benefits such as enhanced generalization and calibration. Its implications for preserving model privacy, however, have remained unexplored. To fill this gap, we investigate the impact of label smoothing on model inversion attacks (MIAs), which aim to generate class-representative samples by exploiting the knowledge encoded in a classifier, thereby inferring sensitive information about its training data. Through extensive analyses, we uncover that traditional label smoothing fosters MIAs, thereby increasing a model's privacy leakage. Even more, we reveal that smoothing with negative factors counters this trend, impeding the extraction of class-related information and leading to privacy preservation, beating state-of-the-art defenses. This establishes a practical and powerful novel way for enhancing model resilience against MIAs.
( 2
min )
arXiv:2402.10547v1 Announce Type: cross
Abstract: This paper tackles the scarcity of benchmarking data in disentangled auditory representation learning. We introduce SynTone, a synthetic dataset with explicit ground truth explanatory factors for evaluating disentanglement techniques. Benchmarking state-of-the-art methods on SynTone highlights its utility for method evaluation. Our results underscore strengths and limitations in audio disentanglement, motivating future research.
( 2
min )
arXiv:2402.10553v1 Announce Type: cross
Abstract: From robots that replace workers to robots that serve as helpful colleagues, the field of robotic automation is experiencing a new trend that represents a huge challenge for component manufacturers. The contribution starts from an innovative vision that sees an ever closer collaboration between Cobot, able to do a specific physical job with precision, the AI world, able to analyze information and support the decision-making process, and the man able to have a strategic vision of the future.
( 2
min )
arXiv:2402.10747v1 Announce Type: new
Abstract: This paper presents a convolutional neural network model for precipitation nowcasting that combines data-driven learning with physics-informed domain knowledge. We propose LUPIN, a Lagrangian Double U-Net for Physics-Informed Nowcasting, that draws from existing extrapolation-based nowcasting methods and implements the Lagrangian coordinate system transformation of the data in a fully differentiable and GPU-accelerated manner to allow for real-time end-to-end training and inference. Based on our evaluation, LUPIN matches and exceeds the performance of the chosen benchmark, opening the door for other Lagrangian machine learning models.
( 2
min )
arXiv:2402.10492v1 Announce Type: new
Abstract: This research utilized three types of artificial neural network (ANN) methodologies, namely Backpropagation Neural Network (BPNN) with varied training, transfer, divide, and learning functions; Radial Basis Function Neural Network (RBFNN); and General Regression Neural Network (GRNN), to forecast the severity of stem rust. It considered parameters such as mean maximum temperature, mean minimum temperature, mean rainfall, mean average temperature, mean relative humidity, and different wheat varieties. The statistical analysis revealed that GRNN demonstrated effective predictive capability and required less training time compared to the other models. Additionally, the results indicated that total seasonal rainfall positively influenced the development of wheat stem rust.
Keywords: Wheat stem rust, Back propagation neural network, Radial Basis Function Neural Network, General Regression Neural Network.
( 2
min )
Amazon SageMaker multi-model endpoints (MMEs) are a fully managed capability of SageMaker inference that allows you to deploy thousands of models on a single endpoint. Previously, MMEs pre-determinedly allocated CPU computing power to models statically regardless the model traffic load, using Multi Model Server (MMS) as its model server. In this post, we discuss a […]
( 13
min )
Modern chatbots can serve as digital agents, providing a new avenue for delivering 24/7 customer service and support across many industries. Their popularity stems from the ability to respond to customer inquiries in real time and handle multiple queries simultaneously in different languages. Chatbots also offer valuable data-driven insights into customer behavior while scaling effortlessly […]
( 12
min )
MIT engineers developed a tag that can reveal with near-perfect accuracy whether an item is real or fake. The key is in the glue on the back of the tag.
( 7
min )
arXiv:2308.14642v2 Announce Type: replace
Abstract: We study regret minimization in online episodic linear Markov Decision Processes, and obtain rate-optimal $\widetilde O (\sqrt K)$ regret where $K$ denotes the number of episodes. Our work is the first to establish the optimal (w.r.t.~$K$) rate of convergence in the stochastic setting with bandit feedback using a policy optimization based approach, and the first to establish the optimal (w.r.t.~$K$) rate in the adversarial setup with full information feedback, for which no algorithm with an optimal rate guarantee is currently known.
( 2
min )
arXiv:2209.03910v2 Announce Type: replace-cross
Abstract: We present PixTrack, a vision based object pose tracking framework using novel view synthesis and deep feature-metric alignment. We follow an SfM-based relocalization paradigm where we use a Neural Radiance Field to canonically represent the tracked object. Our evaluations demonstrate that our method produces highly accurate, robust, and jitter-free 6DoF pose estimates of objects in both monocular RGB images and RGB-D images without the need of any data annotation or trajectory smoothing. Our method is also computationally efficient making it easy to have multi-object tracking with no alteration to our algorithm through simple CPU multiprocessing. Our code is available at: https://github.com/GiantAI/pixtrack
( 2
min )
arXiv:2402.10115v1 Announce Type: cross
Abstract: In this study, we tackle a modern research challenge within the field of perceptual brain decoding, which revolves around synthesizing images from EEG signals using an adversarial deep learning framework. The specific objective is to recreate images belonging to various object categories by leveraging EEG recordings obtained while subjects view those images. To achieve this, we employ a Transformer-encoder based EEG encoder to produce EEG encodings, which serve as inputs to the generator component of the GAN network. Alongside the adversarial loss, we also incorporate perceptual loss to enhance the quality of the generated images.
( 2
min )
arXiv:2402.09807v1 Announce Type: cross
Abstract: In this paper, we propose a Minimax Trust Region (MINIMAX-TR) algorithm and a Minimax Trust Region Algorithm with Contractions and Expansions(MINIMAX-TRACE) algorithm for solving nonconvex-strongly concave minimax problems. Both algorithms can find an $(\epsilon, \sqrt{\epsilon})$-second order stationary point(SSP) within $\mathcal{O}(\epsilon^{-1.5})$ iterations, which matches the best well known iteration complexity.
( 2
min )
arXiv:2402.09786v1 Announce Type: cross
Abstract: Generative adversarial networks generate photorealistic faces that are often indistinguishable by humans from real faces. We find that the discriminator in the pre-trained StyleGAN3 model, a popular GAN network, systematically stratifies scores by both image- and face-level qualities and that this disproportionately affects images across gender, race, and other categories. We examine the discriminator's bias for color and luminance across axes perceived race and gender; we then examine axes common in research on stereotyping in social psychology.
( 2
min )
arXiv:2402.09477v1 Announce Type: cross
Abstract: We introduce a privacy auditing scheme for ML models that relies on membership inference attacks using generated data as "non-members". This scheme, which we call PANORAMIA, quantifies the privacy leakage for large-scale ML models without control of the training process or model re-training and only requires access to a subset of the training data. To demonstrate its applicability, we evaluate our auditing scheme across multiple ML domains, ranging from image and tabular data classification to large-scale language models.
( 2
min )
arXiv:2402.09452v1 Announce Type: cross
Abstract: This paper examines the application of WiFi signals for real-world monitoring of daily activities in home healthcare scenarios. While the state-of-the-art of WiFi-based activity recognition is promising in lab environments, challenges arise in real-world settings due to environmental, subject, and system configuration variables, affecting accuracy and adaptability. The research involved deploying systems in various settings and analyzing data shifts. It aims to guide realistic development of robust, context-aware WiFi sensing systems for elderly care. The findings suggest a shift in WiFi-based activity sensing, bridging the gap between academic research and practical applications, enhancing life quality through technology.
( 2
min )
arXiv:2402.09419v1 Announce Type: cross
Abstract: A novel wavelet-like function is presented that makes it convenient to create filter banks given mainly two parameters that influence the focus area and the filter count. This is accomplished by computing the inverse Fourier transform of Gaussian functions on logarithmic frequency axes in the frequency domain. The resulting filters are similar to Gabor filters and represent oriented brief signal oscillations of different sizes. The wavelet-like function can be thought of as a generalized Log-Gabor filter that is multidimensional, always uses Gaussian functions on logarithmic frequency axes, and innately includes low-pass filters from Gaussian functions located at the frequency domain origin.
( 2
min )
arXiv:2402.10198v1 Announce Type: new
Abstract: Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses the current state-of-the-art model TSMixer by 14.33% on average, while having ~4 times fewer parameters. The code is available at https://github.com/romilbert/samformer.
( 2
min )
arXiv:2402.10145v1 Announce Type: new
Abstract: Federated Learning is a machine learning approach that enables the training of a deep learning model among several participants with sensitive data that wish to share their own knowledge without compromising the privacy of their data. In this research, the authors employ a secured Federated Learning method with an additional layer of privacy and proposes a method for addressing the non-IID challenge. Moreover, differential privacy is compared with chaotic-based encryption as layer of privacy. The experimental approach assesses the performance of the federated deep learning model with differential privacy using both IID and non-IID data. In each experiment, the Federated Learning process improves the average performance metrics of the deep neural network, even in the case of non-IID data.
( 2
min )
arXiv:2402.10076v1 Announce Type: new
Abstract: We introduce QUICK, a group of novel optimized CUDA kernels for the efficient inference of quantized Large Language Models (LLMs). QUICK addresses the shared memory bank-conflict problem of state-of-the-art mixed precision matrix multiplication kernels. Our method interleaves the quantized weight matrices of LLMs offline to skip the shared memory write-back after the dequantization. We demonstrate up to 1.91x speedup over existing kernels of AutoAWQ on larger batches and up to 1.94x throughput gain on representative LLM models on various NVIDIA GPU devices.
( 2
min )
arXiv:2402.09529v1 Announce Type: new
Abstract: We introduce the manifold density function, which is an intrinsic method to validate manifold learning techniques. Our approach adapts and extends Ripley's $K$-function, and categorizes in an unsupervised setting the extent to which an output of a manifold learning algorithm captures the structure of a latent manifold. Our manifold density function generalizes to broad classes of Riemannian manifolds. In particular, we extend the manifold density function to general two-manifolds using the Gauss-Bonnet theorem, and demonstrate that the manifold density function for hypersurfaces is well approximated using the first Laplacian eigenvalue. We prove desirable convergence and robustness properties.
( 2
min )
arXiv:2402.10198v1 Announce Type: cross
Abstract: Transformer-based architectures achieved breakthrough performance in natural language processing and computer vision, yet they remain inferior to simpler linear baselines in multivariate long-term forecasting. To better understand this phenomenon, we start by studying a toy linear forecasting problem for which we show that transformers are incapable of converging to their true solution despite their high expressive power. We further identify the attention of transformers as being responsible for this low generalization capacity. Building upon this insight, we propose a shallow lightweight transformer model that successfully escapes bad local minima when optimized with sharpness-aware optimization. We empirically demonstrate that this result extends to all commonly used real-world multivariate time series datasets. In particular, SAMformer surpasses the current state-of-the-art model TSMixer by 14.33% on average, while having ~4 times fewer parameters. The code is available at https://github.com/romilbert/samformer.
( 2
min )
arXiv:2402.09807v1 Announce Type: cross
Abstract: In this paper, we propose a Minimax Trust Region (MINIMAX-TR) algorithm and a Minimax Trust Region Algorithm with Contractions and Expansions(MINIMAX-TRACE) algorithm for solving nonconvex-strongly concave minimax problems. Both algorithms can find an $(\epsilon, \sqrt{\epsilon})$-second order stationary point(SSP) within $\mathcal{O}(\epsilon^{-1.5})$ iterations, which matches the best well known iteration complexity.
( 2
min )
Generative AI and software-defined computing are transforming the automotive landscape — making the journey behind the wheel safer, smarter and more enjoyable. Dozens of automakers and NVIDIA DRIVE ecosystem partners will be demonstrating their developments in mobility, along with showcasing their next-gen vehicles at GTC, the conference for the era of AI, running from March
Read Article
( 5
min )
Today, we are excited to announce that Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. […]
( 9
min )
arXiv:2402.08711v1 Announce Type: cross
Abstract: A method for analyzing non-asymptotic guarantees of numerical discretizations of ergodic SDEs in Wasserstein-2 distance is presented by Sanz-Serna and Zygalakis in ``Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations". They analyze the UBU integrator which is strong order two and only requires one gradient evaluation per step, resulting in desirable non-asymptotic guarantees, in particular $\mathcal{O}(d^{1/4}\epsilon^{-1/2})$ steps to reach a distance of $\epsilon > 0$ in Wasserstein-2 distance away from the target distribution. However, there is a mistake in the local error estimates in Sanz-Serna and Zygalakis (2021), in particular, a stronger assumption is needed to achieve these complexity estimates. This note reconciles the theory with the dimension dependence observed in practice in many applications of interest.
( 2
min )
arXiv:2402.09249v1 Announce Type: new
Abstract: Neural networks are the state-of-the-art approach for many tasks and the activation function is one of the main building blocks that allow such performance. Recently, a novel transformative adaptive activation function (TAAF) allowing for any vertical and horizontal translation and scaling was proposed. This work sets the TAAF into the context of other activation functions. It shows that the TAAFs generalize over 50 existing activation functions and utilize similar concepts as over 70 other activation functions, underscoring the versatility of TAAFs. This comprehensive exploration positions TAAFs as a promising and adaptable addition to neural networks.
( 2
min )
arXiv:2402.09046v1 Announce Type: cross
Abstract: Inspired by Bayesian approaches to brain function in neuroscience, we give a simple theory of probabilistic inference for a unified account of reasoning and learning. We simply model how data cause symbolic knowledge in terms of its satisfiability in formal logic. The underlying idea is that reasoning is a process of deriving symbolic knowledge from data via abstraction, i.e., selective ignorance. The logical consequence relation is discussed for its proof-based theoretical correctness. The MNIST dataset is discussed for its experiment-based empirical correctness.
( 2
min )
arXiv:2402.09358v1 Announce Type: cross
Abstract: This study demonstrates the first in-hospital adaptation of a cloud-based AI, similar to ChatGPT, into a secure model for analyzing radiology reports, prioritizing patient data privacy. By employing a unique sentence-level knowledge distillation method through contrastive learning, we achieve over 95% accuracy in detecting anomalies. The model also accurately flags uncertainties in its predictions, enhancing its reliability and interpretability for physicians with certainty indicators. These advancements represent significant progress in developing secure and efficient AI tools for healthcare, suggesting a promising future for in-hospital AI applications with minimal supervision.
( 2
min )
arXiv:2402.08992v1 Announce Type: cross
Abstract: This paper proposes a stochastic proximal point method to solve a stochastic convex composite optimization problem. High probability results in stochastic optimization typically hinge on restrictive assumptions on the stochastic gradient noise, for example, sub-Gaussian distributions. Assuming only weak conditions such as bounded variance of the stochastic gradient, this paper establishes a low sample complexity to obtain a high probability guarantee on the convergence of the proposed method. Additionally, a notable aspect of this work is the development of a subroutine to solve the proximal subproblem, which also serves as a novel technique for variance reduction.
( 2
min )
arXiv:2402.09236v1 Announce Type: new
Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach.
( 2
min )
arXiv:2402.08948v1 Announce Type: new
Abstract: In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the input onto a low-dimensional subspace. We propose a basis-free generalization of the merged-staircase property in Abbe et al. (2022) and establish a necessary condition for the SGD-learnability. In addition, we prove that the condition is almost sufficient, in the sense that a condition slightly stronger than the necessary condition can guarantee the exponential decay of the loss functional to zero.
( 2
min )
arXiv:2402.08923v1 Announce Type: new
Abstract: This paper presents a novel approach for predicting human poses using IMU data, diverging from previous studies such as DIP-IMU, IMUPoser, and TransPose, which use up to 6 IMUs in conjunction with bidirectional RNNs. We introduce two main innovations: a data-driven strategy for optimal IMU placement and a transformer-based model architecture for time series analysis. Our findings indicate that our approach not only outperforms traditional 6 IMU-based biRNN models but also that the transformer architecture significantly enhances pose reconstruction from data obtained from 24 IMU locations, with equivalent performance to biRNNs when using only 6 IMUs. The enhanced accuracy provided by our optimally chosen locations, when coupled with the parallelizability and performance of transformers, provides significant improvements to the field of IMU-based pose estimation.
( 2
min )
arXiv:2402.08711v1 Announce Type: new
Abstract: A method for analyzing non-asymptotic guarantees of numerical discretizations of ergodic SDEs in Wasserstein-2 distance is presented by Sanz-Serna and Zygalakis in ``Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations". They analyze the UBU integrator which is strong order two and only requires one gradient evaluation per step, resulting in desirable non-asymptotic guarantees, in particular $\mathcal{O}(d^{1/4}\epsilon^{-1/2})$ steps to reach a distance of $\epsilon > 0$ in Wasserstein-2 distance away from the target distribution. However, there is a mistake in the local error estimates in Sanz-Serna and Zygalakis (2021), in particular, a stronger assumption is needed to achieve these complexity estimates. This note reconciles the theory with the dimension dependence observed in practice in many applications of interest.
( 2
min )
arXiv:2402.09236v1 Announce Type: cross
Abstract: To build intelligent machine learning systems, there are two broad approaches. One approach is to build inherently interpretable models, as endeavored by the growing field of causal representation learning. The other approach is to build highly-performant foundation models and then invest efforts into understanding how they work. In this work, we relate these two approaches and study how to learn human-interpretable concepts from data. Weaving together ideas from both fields, we formally define a notion of concepts and show that they can be provably recovered from diverse data. Experiments on synthetic data and large language models show the utility of our unified approach.
( 2
min )
arXiv:2402.08992v1 Announce Type: cross
Abstract: This paper proposes a stochastic proximal point method to solve a stochastic convex composite optimization problem. High probability results in stochastic optimization typically hinge on restrictive assumptions on the stochastic gradient noise, for example, sub-Gaussian distributions. Assuming only weak conditions such as bounded variance of the stochastic gradient, this paper establishes a low sample complexity to obtain a high probability guarantee on the convergence of the proposed method. Additionally, a notable aspect of this work is the development of a subroutine to solve the proximal subproblem, which also serves as a novel technique for variance reduction.
( 2
min )
It’s been five years since the telecommunications industry first deployed 5G networks to drive new performance levels for customers and unlock new value for telcos. But that industry milestone has been overshadowed by the emergence of generative AI and the swift pace at which telcos are embracing large language models as they seek to transform
Read Article
( 7
min )
Adobe is putting generative AI into the hands of creators with Adobe Firefly — powered by NVIDIA in the cloud — and adding to its impressive app lineup with exciting new features.
( 7
min )
Providing a peek at the architecture powering advanced AI factories, NVIDIA Thursday released a video that offers the first public look at Eos, its latest data-center-scale supercomputer. An extremely large-scale NVIDIA DGX SuperPOD, Eos is where NVIDIA developers create their AI breakthroughs using accelerated computing infrastructure and fully optimized software. Eos is built with 576
Read Article
( 5
min )
GFN Thursday keeps its fourth anniversary celebrations rolling by bringing Ubisoft’s Skull and Bones and Microsoft’s Halo Infinite to the cloud this week. They’re part of five newly supported games, and thanks to the power of the cloud, members can play them at unrivaled quality across nearly any device. The Ultimate Upgrade, Instantly When GeForce
Read Article
( 6
min )
With the use of cloud computing, big data and machine learning (ML) tools like Amazon Athena or Amazon SageMaker have become available and useable by anyone without much effort in creation and maintenance. Industrial companies increasingly look at data analytics and data-driven decision-making to increase resource efficiency across their entire portfolio, from operations to performing […]
( 12
min )
arXiv:2401.15719v2 Announce Type: replace-cross
Abstract: We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to establish a non-asymptotic central limit theorem for Temporal Difference (TD) learning with averaging.
( 2
min )
arXiv:2402.08662v1 Announce Type: cross
Abstract: We present a minimal phase oscillator model for learning quadrupedal locomotion. Each of the four oscillators is coupled only to itself and its corresponding leg through local feedback of the ground reaction force, which can be interpreted as an observer feedback gain. We interpret the oscillator itself as a latent contact state-estimator. Through a systematic ablation study, we show that the combination of phase observations, simple phase-based rewards, and the local feedback dynamics induces policies that exhibit emergent gait preferences, while using a reduced set of simple rewards, and without prescribing a specific gait. The code is open-source, and a video synopsis available at https://youtu.be/1NKQ0rSV3jU.
( 2
min )
arXiv:2402.08108v1 Announce Type: cross
Abstract: We propose a new method for finding statistical arbitrages that can contain more assets than just the traditional pair. We formulate the problem as seeking a portfolio with the highest volatility, subject to its price remaining in a band and a leverage limit. This optimization problem is not convex, but can be approximately solved using the convex-concave procedure, a specific sequential convex programming method. We show how the method generalizes to finding moving-band statistical arbitrages, where the price band midpoint varies over time.
( 2
min )
arXiv:2402.08082v1 Announce Type: cross
Abstract: While score-based generative models (SGMs) have achieved remarkable success in enormous image generation tasks, their mathematical foundations are still limited. In this paper, we analyze the approximation and generalization of SGMs in learning a family of sub-Gaussian probability distributions. We introduce a notion of complexity for probability distributions in terms of their relative density with respect to the standard Gaussian measure. We prove that if the log-relative density can be locally approximated by a neural network whose parameters can be suitably bounded, then the distribution generated by empirical score matching approximates the target distribution in total variation with a dimension-independent rate. We illustrate our theory through examples, which include certain mixtures of Gaussians. An essential ingredient of our proof is to derive a dimension-free deep neural network approximation rate for the true score function associated with the forward process, which is interesting in its own right.
( 2
min )
arXiv:2402.08676v1 Announce Type: new
Abstract: Motivated by the recent application of approximate message passing (AMP) to the analysis of convex optimizations in multi-class classifications [Loureiro, et. al., 2021], we present a convergence analysis of AMP dynamics with non-separable multivariate nonlinearities. As an application, we present a complete (and independent) analysis of the motivated convex optimization problem.
( 2
min )
arXiv:2402.08491v1 Announce Type: new
Abstract: Cellular reprogramming can be used for both the prevention and cure of different diseases. However, the efficiency of discovering reprogramming strategies with classical wet-lab experiments is hindered by lengthy time commitments and high costs. In this study, we develop a~novel computational framework based on deep reinforcement learning that facilitates the identification of reprogramming strategies. For this aim, we formulate a~control problem in the context of cellular reprogramming for the frameworks of BNs and PBNs under the asynchronous update mode. Furthermore, we introduce the notion of a~pseudo-attractor and a~procedure for identification of pseudo-attractor state during training. Finally, we devise a~computational framework for solving the control problem, which we test on a~number of different models.
( 2
min )
arXiv:2402.08056v1 Announce Type: new
Abstract: MIML library is a Java software tool to develop, test, and compare classification algorithms for multi-instance multi-label (MIML) learning. The library includes 43 algorithms and provides a specific format and facilities for data managing and partitioning, holdout and cross-validation methods, standard metrics for performance evaluation, and generation of reports. In addition, algorithms can be executed through $xml$ configuration files without needing to program. It is platform-independent, extensible, free, open-source, and available on GitHub under the GNU General Public License.
( 2
min )
arXiv:2402.08082v1 Announce Type: new
Abstract: While score-based generative models (SGMs) have achieved remarkable success in enormous image generation tasks, their mathematical foundations are still limited. In this paper, we analyze the approximation and generalization of SGMs in learning a family of sub-Gaussian probability distributions. We introduce a notion of complexity for probability distributions in terms of their relative density with respect to the standard Gaussian measure. We prove that if the log-relative density can be locally approximated by a neural network whose parameters can be suitably bounded, then the distribution generated by empirical score matching approximates the target distribution in total variation with a dimension-independent rate. We illustrate our theory through examples, which include certain mixtures of Gaussians. An essential ingredient of our proof is to derive a dimension-free deep neural network approximation rate for the true score function associated with the forward process, which is interesting in its own right.
( 2
min )
arXiv:2402.08543v1 Announce Type: cross
Abstract: Despite a large and significant body of recent work focused on estimating the out-of-sample risk of regularized models in the high dimensional regime, a theoretical understanding of this problem for non-differentiable penalties such as generalized LASSO and nuclear norm is missing. In this paper we resolve this challenge. We study this problem in the proportional high dimensional regime where both the sample size n and number of features p are large, and n/p and the signal-to-noise ratio (per observation) remain finite. We provide finite sample upper bounds on the expected squared error of leave-one-out cross-validation (LO) in estimating the out-of-sample risk. The theoretical framework presented here provides a solid foundation for elucidating empirical findings that show the accuracy of LO.
( 2
min )
arXiv:2006.06530v2 Announce Type: cross
Abstract: We sample aggravated cases following age-structured probabilities from confirmed cases and use ICU occupation data to find a subnotification factor. A logistic fit is then employed to project the progression of the COVID-19 epidemic with plateau scenarios taken from locations that have reached this stage. Finally, the logistic curve found is corrected by the subnotification factor and sampled to project the future demand for ICU beds.
( 2
min )
The fusion of the physical and digital worlds is reshaping the automotive industry. NVIDIA’s automotive partners are using digitalization to transform every phase of the product lifecycle — evolving primarily physical, manual processes into software-driven, AI-enhanced digital systems. Watch the video to learn more. Digitalization: A Game Changer From End to End Kaivan Karimi, global
Read Article
( 5
min )
Thanks to their work driving AI forward, Akshit Arora and Rafael Valle could someday speak to their spouses’ families in their native languages. Arora and Valle — along with colleagues Sungwon Kim and Rohan Badlani — won the LIMMITS ’24 challenge which asks contestants to recreate in real time a speaker’s voice in English or
Read Article
( 7
min )
NASCAR races are all about speed, but even the fastest cars need to factor in safety, especially as rules and tracks change. The Ohio Supercomputer Center is ready to help. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with Alan Chalker, the director of strategic programs at the OSC, about all things
Read Article
( 5
min )
Effective self-service options are becoming increasingly critical for contact centers, but implementing them well presents unique challenges. Amazon Lex provides your Amazon Connect contact center with chatbot functionalities such as automatic speech recognition (ASR) and natural language understanding (NLU) capabilities through voice and text channels. The bot takes natural language speech or text input, recognizes […]
( 10
min )
Pose estimation is a computer vision technique that detects a set of points on objects (such as people or vehicles) within images or videos. Pose estimation has real-world applications in sports, robotics, security, augmented reality, media and entertainment, medical applications, and more. Pose estimation models are trained on images or videos that are annotated with […]
( 16
min )
With the advent of generative AI solutions, organizations are finding different ways to apply these technologies to gain edge over their competitors. Intelligent applications, powered by advanced foundation models (FMs) trained on huge datasets, can now understand natural language, interpret meaning and intent, and generate contextually relevant and human-like responses. This is fueling innovation across […]
( 10
min )
Innovative AI system from MIT CSAIL melds simulations and physical testing to forge materials with newfound durability and flexibility for diverse engineering uses.
( 5
min )
We terminated accounts associated with state-affiliated threat actors. Our findings show our models offer only limited, incremental capabilities for malicious cybersecurity tasks.
( 3
min )
Machine learning (ML) models can memorize training datasets. As a result, training ML models over private datasets can lead to the violation of individuals' privacy. Differential privacy (DP) is a rigorous privacy notion to preserve the privacy of underlying training datasets. Yet, training ML models in a DP framework usually degrades the accuracy of ML models. This paper aims to boost the accuracy of a DP logistic regression (LR) via a pre-training module. In more detail, we initially pre-train our LR model on a public training dataset that there is no privacy concern about it. Then, we fine-tune our DP-LR model with the private dataset. In the numerical results, we show that adding a pre-training module significantly improves the accuracy of the DP-LR model.
( 2
min )
Blindness and other eye diseases are a global health concern, particularly in low- and middle-income countries like India. In this regard, during the COVID-19 pandemic, teleophthalmology became a lifeline, and the Grabi attachment for smartphone-based eye imaging gained in use. However, quality of user-captured image often remained inadequate, requiring clinician vetting and delays. In this backdrop, we propose an AI-based quality assessment system with instant feedback mimicking clinicians' judgments and tested on patient-captured images. Dividing the complex problem hierarchically, here we tackle a nontrivial part, and demonstrate a proof of the concept.
( 2
min )
Graphon games have been introduced to study games with many players who interact through a weighted graph of interaction. By passing to the limit, a game with a continuum of players is obtained, in which the interactions are through a graphon. In this paper, we focus on a graphon game for optimal investment under relative performance criteria, and we propose a deep learning method. The method builds upon two key ingredients: first, a characterization of Nash equilibria by forward-backward stochastic differential equations and, second, recent advances of machine learning algorithms for stochastic differential games. We provide numerical experiments on two different financial models. In each model, we compare the effect of several graphons, which correspond to different structures of interactions.
( 2
min )
We study the asymptotic behavior of second-order algorithms mixing Newton's method and inertial gradient descent in non-convex landscapes. We show that, despite the Newtonian behavior of these methods, they almost always escape strict saddle points. We also evidence the role played by the hyper-parameters of these methods in their qualitative behavior near critical points. The theoretical results are supported by numerical illustrations.
( 2
min )
We consider bounded discrete time series. From its statistical feature, without any use of the Fourier transform, we find a suitable almost periodic function which approximates the corresponding time series in a local time interval.
( 2
min )
We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement.
( 2
min )
Different diseases, such as histological subtypes of breast lesions, have severely varying incidence rates. Even trained with substantial amount of in-distribution (ID) data, models often encounter out-of-distribution (OOD) samples belonging to unseen classes in clinical reality. To address this, we propose a novel framework built upon a long-tailed OOD detection task for breast ultrasound images. It is equipped with a triplet state augmentation (TriAug) which improves ID classification accuracy while maintaining a promising OOD detection performance. Meanwhile, we designed a balanced sphere loss to handle the class imbalanced problem.
( 2
min )
This research explores the integration of language embeddings for active learning in autonomous driving datasets, with a focus on novelty detection. Novelty arises from unexpected scenarios that autonomous vehicles struggle to navigate, necessitating higher-level reasoning abilities. Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning. The research presents a clustering experiment using Contrastive Language-Image Pretrained (CLIP) embeddings to organize datasets and detect novelties. We find that the proposed algorithm effectively isolates novel scenes from a collection of subsets derived from two real-world driving datasets, one vehicle-mounted and one infrastructure-mounted. From the generated clusters, we further present methods for generating textual explanations of elements which differentiate scenes classified as novel from other scenes in the data pool, presenting qualitative examples from the clustered results. Our results demonstrate the effectiveness of language-driven embeddings in identifying novel elements and generating explanations of data, and we further discuss potential applications in safe takeovers, data curation, and multi-task active learning.
( 2
min )
The advancement of Large Language Models (LLM) has also resulted in an equivalent proliferation in its applications. Software design, being one, has gained tremendous benefits in using LLMs as an interface component that extends fixed user stories. However, inclusion of LLM-based AI agents in software design often poses unexpected challenges, especially in the estimation of development efforts. Through the example of UI-based user stories, we provide a comparison against traditional methods and propose a new way to enhance specifications of natural language-based questions that allows for the estimation of development effort by taking into account data sources, interfaces and algorithms.
( 2
min )
This work proposes a class of locally differentially private mechanisms for linear queries, in particular range queries, that leverages correlated input perturbation to simultaneously achieve unbiasedness, consistency, statistical transparency, and control over utility requirements in terms of accuracy targets expressed either in certain query margins or as implied by the hierarchical database structure. The proposed Cascade Sampling algorithm instantiates the mechanism exactly and efficiently. Our bounds show that we obtain near-optimal utility while being empirically competitive against output perturbation methods.
( 2
min )
This paper introduces a novel generative model for discrete distributions based on continuous normalizing flows on the submanifold of factorizing discrete measures. Integration of the flow gradually assigns categories and avoids issues of discretizing the latent continuous model like rounding, sample truncation etc. General non-factorizing discrete distributions capable of representing complex statistical dependencies of structured discrete data, can be approximated by embedding the submanifold into a the meta-simplex of all joint discrete distributions and data-driven averaging. Efficient training of the generative model is demonstrated by matching the flow of geodesics of factorizing discrete distributions. Various experiments underline the approach's broad applicability.
( 2
min )
We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhang et al. (2023). Our analysis provides new theoretical results on categorical approaches to distributional RL, and also introduces a new distributional Bellman equation, the stochastic categorical CDF Bellman equation, which we expect to be of independent interest. We also provide an experimental study comparing several model-based distributional RL algorithms, with several takeaways for practitioners.
( 2
min )
We present General Time Transformer (GTT), an encoder-only style foundation model for zero-shot multivariate time series forecasting. GTT is pretrained on a large dataset of 200M high-quality time series samples spanning diverse domains. In our proposed framework, the task of multivariate time series forecasting is formulated as a channel-wise next curve shape prediction problem, where each time series sample is represented as a sequence of non-overlapping curve shapes with a unified numerical magnitude. GTT is trained to predict the next curve shape based on a window of past curve shapes in a channel-wise manner. Experimental results demonstrate that GTT exhibits superior zero-shot multivariate forecasting capabilities on unseen time series datasets, even surpassing state-of-the-art supervised baselines. Additionally, we investigate the impact of varying GTT model parameters and training dataset scales, observing that the scaling law also holds in the context of zero-shot multivariate time series forecasting.
( 2
min )
We present a novel deep-learning-based method to cluster words in documents which we apply to detect and recognize tables given the OCR output. We interpret table structure bottom-up as a graph of relations between pairs of words (belonging to the same row, column, header, as well as to the same table) and use a transformer encoder model to predict its adjacency matrix. We demonstrate the performance of our method on the PubTables-1M dataset as well as PubTabNet and FinTabNet datasets. Compared to the current state-of-the-art detection methods such as DETR and Faster R-CNN, our method achieves similar or better accuracy, while requiring a significantly smaller model.
( 2
min )
This is an expository article on the score-based diffusion models, with a particular focus on the formulation via stochastic differential equations (SDE). After a gentle introduction, we discuss the two pillars in the diffusion modeling -- sampling and score matching, which encompass the SDE/ODE sampling, score matching efficiency, the consistency model, and reinforcement learning. Short proofs are given to illustrate the main idea of the stated results. The article is primarily for introducing the beginners to the field, and practitioners may also find some analysis useful in designing new models or algorithms.
( 2
min )
A significant challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies that attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), a principled algorithm that decomposes the task of finding the Pareto front into a sequence of single-objective problems for which various solution methods exist. This enables us to establish convergence guarantees while providing an upper bound on the distance to undiscovered Pareto optimal solutions at each step. Empirical evaluations demonstrate that IPRO matches or outperforms methods that require additional domain knowledge. By leveraging problem-specific single-objective solvers, our approach also holds promise for applications beyond multi-objective reinforcement learning, such as in pathfinding and optimisation.
( 2
min )
Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on developing efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O}(\sqrt{T})$, which is suboptimal compared to the $\mathcal{O}(\log T)$ bound guaranteed by the state-of-the-art linear programming (LP)-based online algorithms. This paper establishes several important facts about online linear programming, which unveils the challenge for first-order-method-based online algorithms to achieve beyond $\mathcal{O}(\sqrt{T})$ regret. To address the challenge, we introduce a new algorithmic framework that decouples learning from decision-making. More importantly, for the first time, we show that first-order methods can attain regret $\mathcal{O}(T^{1/3})$ with this new framework. Lastly, we conduct numerical experiments to validate our theoretical findings.
( 2
min )
This work addresses the performance comparison between four clustering techniques with the objective of achieving strong hybrid models in supervised learning tasks. A real dataset from a bio-climatic house named Sotavento placed on experimental wind farm and located in Xermade (Lugo) in Galicia (Spain) has been collected. Authors have chosen the thermal solar generation system in order to study how works applying several cluster methods followed by a regression technique to predict the output temperature of the system. With the objective of defining the quality of each clustering method two possible solutions have been implemented. The first one is based on three unsupervised learning metrics (Silhouette, Calinski-Harabasz and Davies-Bouldin) while the second one, employs the most common error measurements for a regression algorithm such as Multi Layer Perceptron.
( 2
min )
Several methods have been proposed for correcting the elevation bias in digital elevation models (DEMs) for example, linear regression. Nowadays, supervised machine learning enables the modelling of complex relationships between variables, and has been deployed by researchers in a variety of fields. In the existing literature, several studies have adopted either machine learning or statistical approaches in the task of DEM correction. However, to our knowledge, none of these studies have compared the performance of both approaches, especially with regard to open-access global DEMs. Our previous work has already shown the potential of machine learning approaches, specifically gradient boosted decision trees (GBDTs) for DEM correction. In this study, we share some results from the comparison of three recent implementations of gradient boosted decision trees (XGBoost, LightGBM and CatBoost), versus multiple linear regression (MLR) for enhancing the vertical accuracy of 30 m Copernicus and AW3D global DEMs in Cape Town, South Africa.
( 2
min )
Using electronic health records data and machine learning to guide future decisions needs to address challenges, including 1) long/short-term dependencies and 2) interactions between diseases and interventions. Bidirectional transformers have effectively addressed the first challenge. Here we tackled the latter challenge by masking one source (e.g., ICD10 codes) and training the transformer to predict it using other sources (e.g., ATC codes).
( 2
min )
We investigate the test risk of continuous-time stochastic gradient flow dynamics in learning theory. Using a path integral formulation we provide, in the regime of a small learning rate, a general formula for computing the difference between test risk curves of pure gradient and stochastic gradient flows. We apply the general theory to a simple model of weak features, which displays the double descent phenomenon, and explicitly compute the corrections brought about by the added stochastic term in the dynamics, as a function of time and model parameters. The analytical results are compared to simulations of discrete-time stochastic gradient descent and show good agreement.
( 2
min )
This paper introduces a novel generative model for discrete distributions based on continuous normalizing flows on the submanifold of factorizing discrete measures. Integration of the flow gradually assigns categories and avoids issues of discretizing the latent continuous model like rounding, sample truncation etc. General non-factorizing discrete distributions capable of representing complex statistical dependencies of structured discrete data, can be approximated by embedding the submanifold into a the meta-simplex of all joint discrete distributions and data-driven averaging. Efficient training of the generative model is demonstrated by matching the flow of geodesics of factorizing discrete distributions. Various experiments underline the approach's broad applicability.
( 2
min )
We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhang et al. (2023). Our analysis provides new theoretical results on categorical approaches to distributional RL, and also introduces a new distributional Bellman equation, the stochastic categorical CDF Bellman equation, which we expect to be of independent interest. We also provide an experimental study comparing several model-based distributional RL algorithms, with several takeaways for practitioners.
( 2
min )
Perhaps the greatest challenge – and opportunity – of LLMs is extending their powerful capabilities to solve problems beyond the data on which they have been trained, and to achieve comparable results with data the LLM has never seen. This opens new possibilities in data investigation, such as identifying themes and semantic concepts with context […]
The post GraphRAG: Unlocking LLM discovery on narrative private data appeared first on Microsoft Research.
( 15
min )
This post is co-written with Santosh Waddi and Nanda Kishore Thatikonda from BigBasket. BigBasket is India’s largest online food and grocery store. They operate in multiple ecommerce channels such as quick commerce, slotted delivery, and daily subscriptions. You can also buy from their physical stores and vending machines. They offer a large assortment of over […]
( 9
min )
Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics. Features are used […]
( 11
min )
Chatbots are used by millions of people around the world every day, powered by NVIDIA GPU-based cloud servers. Now, these groundbreaking tools are coming to Windows PCs powered by NVIDIA RTX for local, fast, custom generative AI. Chat with RTX, now free to download, is a tech demo that lets users personalize a chatbot with
Read Article
( 6
min )
Researchers developed a simple yet effective solution for a puzzling problem that can worsen the performance of large language models such as ChatGPT.
( 7
min )
We propose a novel method for privacy-preserving deep neural networks (DNNs) with the Vision Transformer (ViT). The method allows us not only to train models and test with visually protected images but to also avoid the performance degradation caused from the use of encrypted images, whereas conventional methods cannot avoid the influence of image encryption. A domain adaptation method is used to efficiently fine-tune ViT with encrypted images. In experiments, the method is demonstrated to outperform conventional methods in an image classification task on the CIFAR-10 and ImageNet datasets in terms of classification accuracy.
( 2
min )
We demonstrate a multiplication method based on numbers represented as set of polynomial radix 2 indices stored as an integer list. The 'polynomial integer index multiplication' method is a set of algorithms implemented in python code. We demonstrate the method to be faster than both the Number Theoretic Transform (NTT) and Karatsuba for multiplication within a certain bit range. Also implemented in python code for comparison purposes with the polynomial radix 2 integer method. We demonstrate that it is possible to express any integer or real number as a list of integer indices, representing a finite series in base two. The finite series of integer index representation of a number can then be stored and distributed across multiple CPUs / GPUs. We show that operations of addition and multiplication can be applied as two's complement additions operating on the index integer representations and can be fully distributed across a given CPU / GPU architecture. We demonstrate fully distributed arithmetic operations such that the 'polynomial integer index multiplication' method overcomes the current limitation of parallel multiplication methods. Ie, the need to share common core memory and common disk for the calculation of results and intermediate results.
( 3
min )
This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stopping criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and real datasets, we demonstrate that our methods are much more efficient than existing methods, such as the original proximal linear algorithm and the subgradient method.
( 2
min )
We show that the Rademacher complexity-based approach can generate non-vacuous generalisation bounds on Convolutional Neural Networks (CNNs) for classifying a small number of classes of images. The development of new Talagrand's contraction lemmas for high-dimensional mappings between function spaces and CNNs for general Lipschitz activation functions is a key technical contribution. Our results show that the Rademacher complexity does not depend on the network length for CNNs with some special types of activation functions such as ReLU, Leaky ReLU, Parametric Rectifier Linear Unit, Sigmoid, and Tanh.
( 2
min )
We study the online learnability of hypothesis classes with respect to arbitrary, but bounded loss functions. No characterization of online learnability is known at this level of generality. We give a new scale-sensitive combinatorial dimension, named the sequential minimax dimension, and show that it gives a tight quantitative characterization of online learnability. In addition, we show that the sequential minimax dimension subsumes most existing combinatorial dimensions in online learning theory.
( 2
min )
We introduce a new dataset named WikiVitals which contains a large graph of 48k mutually referred Wikipedia articles classified into 32 categories and connected by 2.3M edges. Our aim is to rigorously evaluate the contributions of three distinct sources of information to the label prediction in a semi-supervised node classification setting, namely the content of the articles, their connections with each other and the correlations among their labels. We perform this evaluation using a Graph Markov Neural Network which provides a theoretically principled model for this task and we conduct a detailed evaluation of the contributions of each sources of information using a clear separation of model selection and model assessment. One interesting observation is that including the effect of label dependencies is more relevant for sparse train sets than it is for dense train sets.
( 2
min )
3D building models with facade details are playing an important role in many applications now. Classifying point clouds at facade-level is key to create such digital replicas of the real world. However, few studies have focused on such detailed classification with deep neural networks. We propose a method fusing geometric features with deep learning networks for point cloud classification at facade-level. Our experiments conclude that such early-fused features improve deep learning methods' performance. This method can be applied for compensating deep learning networks' ability in capturing local geometric information and promoting the advancement of semantic segmentation.
( 2
min )
We present a self-contained proof of the convergence rate of the Stochastic Gradient Descent (SGD) when the learning rate follows an inverse time decays schedule; we next apply the results to the convergence of a modified form of policy gradient Multi-Armed Bandit (MAB) with $L2$ regularization.
( 2
min )
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-as-betting framework and provides a non-asymptotic $\alpha$-level test across any stopping time. PEAK is computationally tractable and efficiently rejects hypotheses that are incorrect across all potential distributions that satisfy our nonparametric assumption, enabling joint composite hypothesis testing on multiple streams of data. We numerically validate our theoretical findings under the best arm identification and threshold identification in the bandit setting, illustrating the computational efficiency of our method against state-of-the-art testing methods.
( 2
min )
In this paper, we present a fully automatic brain tumor segmentation and classification model using a Deep Convolutional Neural Network that includes a multiscale approach. One of the differences of our proposal with respect to previous works is that input images are processed in three spatial scales along different processing pathways. This mechanism is inspired in the inherent operation of the Human Visual System. The proposed neural model can analyze MRI images containing three types of tumors: meningioma, glioma, and pituitary tumor, over sagittal, coronal, and axial views and does not need preprocessing of input images to remove skull or vertebral column parts in advance. The performance of our method on a publicly available MRI image dataset of 3064 slices from 233 patients is compared with previously classical machine learning and deep learning published methods. In the comparison, our method remarkably obtained a tumor classification accuracy of 0.973, higher than the other approaches using the same database.
( 2
min )
The use of a wide range of computer vision solutions, and more recently high-end Inertial Measurement Units (IMU) have become increasingly popular for assessing human physical activity in clinical and research settings. Nevertheless, to increase the feasibility of patient tracking in out-of-the-lab settings, it is necessary to use a reduced number of devices for movement acquisition. Promising solutions in this context are IMU-based wearables and single camera systems. Additionally, the development of machine learning systems able to recognize and digest clinically relevant data in-the-wild is needed, and therefore determining the ideal input to those is crucial.
( 2
min )
We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the realizable and agnostic setting respectively.
( 2
min )
Time series analysis is relevant in various disciplines such as physics, biology, chemistry, and finance. In this paper, we present a novel neural network architecture that integrates elements from ResNet structures, while introducing the innovative incorporation of the Taylor series framework. This approach demonstrates notable enhancements in test accuracy across many of the baseline datasets investigated. Furthermore, we extend our method to incorporate a recursive step, which leads to even further improvements in test accuracy. Our findings underscore the potential of our proposed model to significantly advance time series analysis methodologies, offering promising avenues for future research and application.
( 2
min )
In recent years Deep Neural Network-based systems are not only increasing in popularity but also receive growing user trust. However, due to the closed-world assumption of such systems, they cannot recognize samples from unknown classes and often induce an incorrect label with high confidence. Presented work looks at the evaluation of methods for Open Set Recognition, focusing on the impact of class imbalance, especially in the dichotomy between known and unknown samples. As an outcome of problem analysis, we present a set of guidelines for evaluation of methods in this field.
( 2
min )
This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions are two adaptive stopping criteria for the subproblem. The convergence behavior of the proposed methods is analyzed. Through experiments on both synthetic and real datasets, we demonstrate that our methods are much more efficient than existing methods, such as the original proximal linear algorithm and the subgradient method.
( 2
min )
We show that the Rademacher complexity-based approach can generate non-vacuous generalisation bounds on Convolutional Neural Networks (CNNs) for classifying a small number of classes of images. The development of new Talagrand's contraction lemmas for high-dimensional mappings between function spaces and CNNs for general Lipschitz activation functions is a key technical contribution. Our results show that the Rademacher complexity does not depend on the network length for CNNs with some special types of activation functions such as ReLU, Leaky ReLU, Parametric Rectifier Linear Unit, Sigmoid, and Tanh.
( 2
min )
We study the problem of learning to predict the next state of a dynamical system when the underlying evolution function is unknown. Unlike previous work, we place no parametric assumptions on the dynamical system, and study the problem from a learning theory perspective. We define new combinatorial measures and dimensions and show that they quantify the optimal mistake and regret bounds in the realizable and agnostic setting respectively.
( 2
min )
We propose a novel nonparametric sequential test for composite hypotheses for means of multiple data streams. Our proposed method, \emph{peeking with expectation-based averaged capital} (PEAK), builds upon the testing-as-betting framework and provides a non-asymptotic $\alpha$-level test across any stopping time. PEAK is computationally tractable and efficiently rejects hypotheses that are incorrect across all potential distributions that satisfy our nonparametric assumption, enabling joint composite hypothesis testing on multiple streams of data. We numerically validate our theoretical findings under the best arm identification and threshold identification in the bandit setting, illustrating the computational efficiency of our method against state-of-the-art testing methods.
( 2
min )
We present a self-contained proof of the convergence rate of the Stochastic Gradient Descent (SGD) when the learning rate follows an inverse time decays schedule; we next apply the results to the convergence of a modified form of policy gradient Multi-Armed Bandit (MAB) with $L2$ regularization.
( 2
min )
This post is co-written with Kostia Kofman and Jenny Tokar from Booking.com. As a global leader in the online travel industry, Booking.com is always seeking innovative ways to enhance its services and provide customers with tailored and seamless experiences. The Ranking team at Booking.com plays a pivotal role in ensuring that the search and recommendation […]
( 12
min )
Every country needs to own the production of their own intelligence, NVIDIA founder and CEO Jensen Huang told attendees Monday at the World Governments Summit in Dubai. Huang, who spoke as part of a fireside chat with the UAE’s Minister of AI, His Excellency Omar Al Olama, described sovereign AI — which emphasizes a country’s
Read Article
( 6
min )
Generative AI is driving change across industries — and to take advantage of its benefits, businesses must select the right hardware to power their workflows. The new NVIDIA RTX 2000 Ada Generation GPU delivers the latest AI, graphics and compute technology to compact workstations, offering up to 1.5x the performance of the previous-generation RTX A2000
Read Article
( 7
min )
AI Weirdness: the strange side of machine learning
( 2
min )
The power requirements posed by the fifth-generation and beyond cellular networks are an important constraint in network deployment and require energy-efficient solutions. In this work, we propose a novel user load transfer approach using airborne base stations (BS) mounted on drones for reliable and secure power redistribution across the micro-grid network comprising green small cell BSs. Depending on the user density and the availability of an aerial BS, the energy requirement of a cell with an energy deficit is accommodated by migrating the aerial BS from a high-energy to a low-energy cell. The proposed hybrid drone-based framework integrates long short-term memory with unique cost functions using an evolutionary neural network for drones and BSs and efficiently manages energy and load redistribution. The proposed algorithm reduces power outages at BSs and maintains consistent throughput stability, thereby demonstrating its capability to boost the reliability and robustness of wireless communication systems.
( 2
min )
In recent years, there has been an intense debate about how learning in biological neural networks (BNNs) differs from learning in artificial neural networks. It is often argued that the updating of connections in the brain relies only on local information, and therefore a stochastic gradient-descent type optimization method cannot be used. In this paper, we study a stochastic model for supervised learning in BNNs. We show that a (continuous) gradient step occurs approximately when each learning opportunity is processed by many local updates. This result suggests that stochastic gradient descent may indeed play a role in optimizing BNNs.
( 2
min )
This work proposes $\mu$GUIDE: a general Bayesian framework to estimate posterior distributions of tissue microstructure parameters from any given biophysical model or MRI signal representation, with exemplar demonstration in diffusion-weighted MRI. Harnessing a new deep learning architecture for automatic signal feature selection combined with simulation-based inference and efficient sampling of the posterior distributions, $\mu$GUIDE bypasses the high computational and time cost of conventional Bayesian approaches and does not rely on acquisition constraints to define model-specific summary statistics. The obtained posterior distributions allow to highlight degeneracies present in the model definition and quantify the uncertainty and ambiguity of the estimated parameters.
( 2
min )
We present a framework for learning Hamiltonian systems using data. This work is based on a lifting hypothesis, which posits that nonlinear Hamiltonian systems can be written as nonlinear systems with cubic Hamiltonians. By leveraging this, we obtain quadratic dynamics that are Hamiltonian in a transformed coordinate system. To that end, for given generalized position and momentum data, we propose a methodology to learn quadratic dynamical systems, enforcing the Hamiltonian structure in combination with a weakly-enforced symplectic auto-encoder. The obtained Hamiltonian structure exhibits long-term stability of the system, while the cubic Hamiltonian function provides relatively low model complexity. For low-dimensional data, we determine a higher-dimensional transformed coordinate system, whereas for high-dimensional data, we find a lower-dimensional coordinate system with the desired properties. We demonstrate the proposed methodology by means of both low-dimensional and high-dimensional nonlinear Hamiltonian systems.
( 2
min )
Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for a variety of machine learning tasks. In this survey, we discuss the state of the art, opportunities and open research questions in applying RL to generative AI. In particular, we will discuss three types of applications, namely, RL as an alternative way for generation without specified objectives; as a way for generating outputs while concurrently maximizing an objective function; and, finally, as a way of embedding desired characteristics, which cannot be easily captured by means of an objective function, into the generative process. We conclude the survey with an in-depth discussion of the opportunities and challenges in this fascinating emerging area.
( 2
min )
We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits. We propose an algorithm that uses fixed allocations based on the prior information and the structure of the environment. We provide theoretical bounds on its performance across diverse models, including the first prior-dependent upper bounds for linear and hierarchical BAI. Our key contribution is introducing new proof methods that result in tighter bounds for multi-armed BAI compared to existing methods. We extensively compare our approach to other fixed-budget BAI methods, demonstrating its consistent and robust performance in various settings. Our work improves our understanding of Bayesian fixed-budget BAI in structured bandits and highlights the effectiveness of our approach in practical scenarios.
( 2
min )
Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-world setting. To address this challenge, we propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process, where the targets are derived from a visually-ground speech model, notably eliminating the need for speech-text paired data. Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.
( 2
min )
Background: Eating disorders are increasingly prevalent, and social networks offer valuable information.
Objective: Our goal was to identify efficient machine learning models for categorizing tweets related to eating disorders.
Methods: Over three months, we collected tweets about eating disorders. A 2,000-tweet subset was labeled for: (1) being written by individuals with eating disorders, (2) promoting eating disorders, (3) informativeness, and (4) scientific content. Both traditional machine learning and deep learning models were employed for classification, assessing accuracy, F1 score, and computational time.
Results: From 1,058,957 collected tweets, transformer-based bidirectional encoder representations achieved the highest F1 scores (71.1%-86.4%) across all four categories.
Conclusions: Transformer-based models outperform traditional techniques in classifying eating disorder-related tweets, though they require more computational resources.
( 2
min )
We consider the problem of learning local quantum Hamiltonians given copies of their Gibbs state at a known inverse temperature, following Haah et al. [2108.04842] and Bakshi et al. [arXiv:2310.02243]. Our main technical contribution is a new flat polynomial approximation of the exponential function based on the Chebyshev expansion, which enables the formulation of learning quantum Hamiltonians as a polynomial optimization problem. This, in turn, can benefit from the use of moment/SOS relaxations, whose polynomial bit complexity requires careful analysis [O'Donnell, ITCS 2017]. Finally, we show that learning a $k$-local Hamiltonian, whose dual interaction graph is of bounded degree, runs in polynomial time under mild assumptions.
( 2
min )
In this paper, we first present the character texture generation system \textit{Minecraft-ify}, specified to Minecraft video game toward in-game application. Ours can generate face-focused image for texture mapping tailored to 3D virtual character having cube manifold. While existing projects or works only generate texture, proposed system can inverse the user-provided real image, or generate average/random appearance from learned distribution. Moreover, it can be manipulated with text-guidance using StyleGAN and StyleCLIP. These features provide a more extended user experience with enlarged freedom as a user-friendly AI-tool. Project page can be found at https://gh-bumsookim.github.io/Minecraft-ify/
( 2
min )
Monitoring the status of large computing systems is essential to identify unexpected behavior and improve their performance and uptime. However, due to the large-scale and distributed design of such computing systems as well as a large number of monitoring parameters, automated monitoring methods should be applied. Such automatic monitoring methods should also have the ability to adapt themselves to the continuous changes in the computing system. In addition, they should be able to identify behavioral anomalies in useful time, to perform appropriate reactions. This work proposes a general lightweight and unsupervised method for near real-time anomaly detection using operational data measurement on large computing systems. The proposed model requires as little as 4 hours of data and 50 epochs for each training process to accurately resemble the behavioral pattern of computing systems.
( 2
min )
We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional Q($\lambda$) does not apply importance sampling for off-policy learning, which introduces intriguing interactions with signed measures. Such unique properties distributional Q($\lambda$) from other existing alternatives such as distributional Retrace. We characterize the algorithmic properties of distributional Q($\lambda$) and validate theoretical insights with tabular experiments. We show how distributional Q($\lambda$)-C51, a combination of Q($\lambda$) with the C51 agent, exhibits promising results on deep RL benchmarks.
( 2
min )
We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O(1/\sqrt{N})$ optimality gap for an $N$-armed problem, provided that the single-armed relaxed problem is unichain and aperiodic. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Uniform Global Attractor Property (UGAP) to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption (SA).
( 2
min )
We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits. We propose an algorithm that uses fixed allocations based on the prior information and the structure of the environment. We provide theoretical bounds on its performance across diverse models, including the first prior-dependent upper bounds for linear and hierarchical BAI. Our key contribution is introducing new proof methods that result in tighter bounds for multi-armed BAI compared to existing methods. We extensively compare our approach to other fixed-budget BAI methods, demonstrating its consistent and robust performance in various settings. Our work improves our understanding of Bayesian fixed-budget BAI in structured bandits and highlights the effectiveness of our approach in practical scenarios.
( 2
min )
In this post, we show you how to build an internal SaaS layer to access foundation models with Amazon Bedrock in a multi-tenant (team) architecture. We specifically focus on usage and cost tracking per tenant and also controls such as usage throttling per tenant. We describe how the solution and Amazon Bedrock consumption plans map to the general SaaS journey framework. The code for the solution and an AWS Cloud Development Kit (AWS CDK) template is available in the GitHub repository.
( 13
min )
2024 is the year of great data science predictions targeting big business churn. It is the time to yield benefits from the popular data science frameworks that are streamed to do wonders for industries far and wide. Data science is not just a spoof on the big number game that guides businesses’ growth. It is… Read More »10 Prominent Data Science Predictions 2024- Know What the Industry Experts Say?
The post 10 Prominent Data Science Predictions 2024- Know What the Industry Experts Say? appeared first on Data Science Central.
( 22
min )
Autonomous helicopters made by Rotor Technologies, a startup led by MIT PhDs, take the human out of risky commercial missions.
( 7
min )
Many imitation learning (IL) algorithms employ inverse reinforcement learning (IRL) to infer the intrinsic reward function that an expert is implicitly optimizing for based on their demonstrated behaviors. However, in practice, IRL-based IL can fail to accomplish the underlying task due to a misalignment between the inferred reward and the objective of the task. In this paper, we address the susceptibility of IL to such misalignment by introducing a semi-supervised reward design paradigm called Protagonist Antagonist Guided Adversarial Reward (PAGAR). PAGAR-based IL trains a policy to perform well under mixed reward functions instead of a single reward function as in IRL-based IL. We identify the theoretical conditions under which PAGAR-based IL can avoid the task failures caused by reward misalignment. We also present a practical on-and-off policy approach to implementing PAGAR-based IL. Experimental results show that our algorithm outperforms standard IL baselines in complex tasks and challenging transfer settings.
( 2
min )
Recently, we demonstrated success of a time-synchronized state estimator using deep neural networks (DNNs) for real-time unobservable distribution systems. In this letter, we provide analytical bounds on the performance of that state estimator as a function of perturbations in the input measurements. It has already been shown that evaluating performance based on only the test dataset might not effectively indicate a trained DNN's ability to handle input perturbations. As such, we analytically verify robustness and trustworthiness of DNNs to input perturbations by treating them as mixed-integer linear programming (MILP) problems. The ability of batch normalization in addressing the scalability limitations of the MILP formulation is also highlighted. The framework is validated by performing time-synchronized distribution system state estimation for a modified IEEE 34-node system and a real-world large distribution system, both of which are incompletely observed by micro-phasor measurement units.
( 2
min )
We present a theoretical and empirical analysis of the adaptive entry point selection for graph-based approximate nearest neighbor search (ANNS). We introduce novel concepts: $b\textit{-monotonic path}$ and $B\textit{-MSNET}$, which better capture an actual graph in practical algorithms than existing concepts like MSNET. We prove that adaptive entry point selection offers better performance upper bound than the fixed central entry point under more general conditions than previous work. Empirically, we validate the method's effectiveness in accuracy, speed, and memory usage across various datasets, especially in challenging scenarios with out-of-distribution data and hard instances. Our comprehensive study provides deeper insights into optimizing entry points for graph-based ANNS for real-world high-dimensional data applications.
( 2
min )
Privacy-utility tradeoff remains as one of the fundamental issues of differentially private machine learning. This paper introduces a geometrically inspired kernel-based approach to mitigate the accuracy-loss issue in classification. In this approach, a representation of the affine hull of given data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads to a novel distance measure that hides privacy-sensitive information about individual data points and improves the privacy-utility tradeoff via significantly reducing the risk of membership inference attacks. The effectiveness of the approach is demonstrated through experiments on MNIST dataset, Freiburg groceries dataset, and a real biomedical dataset. It is verified that the approach remains computationally practical. The application of the approach to federated learning is considered and it is observed that the accuracy-loss due to data being distributed is either marginal or not significantly high.
( 2
min )
We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.
( 2
min )
Graph transformers typically lack direct pair-to-pair communication, instead forcing neighboring pairs to exchange information via a common node. We propose the Triplet Graph Transformer (TGT) that enables direct communication between two neighboring pairs in a graph via novel triplet attention and aggregation mechanisms. TGT is applied to molecular property prediction by first predicting interatomic distances from 2D graphs and then using these distances for downstream tasks. A novel three-stage training procedure and stochastic inference further improve training efficiency and model performance. Our model achieves new state-of-the-art (SOTA) results on open challenge benchmarks PCQM4Mv2 and OC20 IS2RE. We also obtain SOTA results on QM9, MOLPCBA, and LIT-PCBA molecular property prediction benchmarks via transfer learning. We also demonstrate the generality of TGT with SOTA results on the traveling salesman problem (TSP).
( 2
min )
This thesis proposes and describes a research attempt at designing and developing a speaker independent spontaneous automatic speech recognition system for Tigrigna The acoustic model of the Speech Recognition System is developed using Carnegie Mellon University Automatic Speech Recognition development tool (Sphinx) while the SRIM tool is used for the development of the language model.
Keywords Automatic Speech Recognition Tigrigna language
( 2
min )
We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at https://github.com/mit-han-lab/efficientvit.
( 2
min )
The probabilistic formal verification (PFV) of AI systems is in its infancy. So far, approaches have been limited to ad-hoc algorithms for specific classes of models and/or properties.
We propose a unifying framework for the PFV of AI systems based onWeighted Model Integration (WMI), which allows to frame the problem in very general terms.
Crucially, this reduction enables the verification of many properties of interest, like fairness, robustness or monotonicity, over a wide range of machine learning models, without making strong distributional assumptions.
We support the generality of the approach by solving multiple verification tasks with a single, off-the-shelf WMI solver, then discuss the scalability challenges and research directions related to this promising framework.
( 2
min )
We consider nonconvex stochastic optimization problems in the asynchronous centralized distributed setup where the communication times from workers to a server can not be ignored, and the computation and communication times are potentially different for all workers. Using an unbiassed compression technique, we develop a new method-Shadowheart SGD-that provably improves the time complexities of all previous centralized methods. Moreover, we show that the time complexity of Shadowheart SGD is optimal in the family of centralized methods with compressed communication. We also consider the bidirectional setup, where broadcasting from the server to the workers is non-negligible, and develop a corresponding method.
( 2
min )
In this paper, we present a novel sequential team selection model in soccer. Specifically, we model the stochastic process of player injury and unavailability using player-specific information learned from real-world soccer data. Monte-Carlo Tree Search is used to select teams for games that optimise long-term team performance across a soccer season by reasoning over player injury probability. We validate our approach compared to benchmark solutions for the 2018/19 English Premier League season. Our model achieves similar season expected points to the benchmark whilst reducing first-team injuries by ~13% and the money inefficiently spent on injured players by ~11% - demonstrating the potential to reduce costs and improve player welfare in real-world soccer teams.
( 2
min )
The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially-private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.
( 2
min )
We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.
( 2
min )
NVIDIA has joined the National Institute of Standards and Technology’s new U.S. Artificial Intelligence Safety Institute Consortium as part of the company’s effort to advance safe, secure and trustworthy AI. AISIC will work to create tools, methodologies and standards to promote the safe and trustworthy development and deployment of AI. As a member, NVIDIA will
Read Article
( 5
min )
The GeForce NOW anniversary celebrations continue with more games and a member-exclusive discount on the Logitech G Cloud. Among the six new titles coming to the cloud this week is The Inquisitor from Kalypso Media, which spotlights the GeForce NOW anniversary with a special shout-out. “Congrats to four years of empowering gamers to play anywhere,
Read Article
( 6
min )
Nonprofit fundraising tools can be excellent resources for assisting organizations in maintaining compliance. However, anyone considering these platforms should know a few things to stay on the right track and avoid issues. Organizations must protect donors’ privacy When a nonprofit’s staff members know details about donors’ sexual orientation, income, race, age and ethnicity, it’s easier… Read More »What nonprofits need to know about compliance for fundraising software
The post What nonprofits need to know about compliance for fundraising software appeared first on Data Science Central.
( 21
min )
Generative AI agents are a versatile and powerful tool for large enterprises. They can enhance operational efficiency, customer service, and decision-making while reducing costs and enabling innovation. These agents excel at automating a wide range of routine and repetitive tasks, such as data entry, customer support inquiries, and content generation. Moreover, they can orchestrate complex, […]
( 19
min )
Over the eight months since its release, ChatGPT and its underlying model, GPT3.5, have garnered massive attention, due to their potent mix of capability and accessibility. While a niche-industry of papers have emerged examining the scope of capabilities these models possess, the information fed to and extracted from these networks has been either natural language text or stylized, code-like language. Drawing inspiration from the prowess we expect a truly human-level intelligent agent to have across multiple signal modalities, in this work we examine GPT3.5's aptitude for visual tasks, where the inputs feature content provided as ASCII-art without overt distillation into a lingual summary. We conduct experiments analyzing the model's performance on image recognition tasks after various transforms typical in visual settings, trials investigating knowledge of image parts, and tasks covering image generation.
( 3
min )
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads, and these insights also provide natural suggestions for alternative architectures.
( 2
min )
In this study, we explore the synergy of deep learning and financial market applications, focusing on pair trading. This market-neutral strategy is integral to quantitative finance and is apt for advanced deep-learning techniques. A pivotal challenge in pair trading is discerning temporal correlations among entities, necessitating the integration of diverse data modalities. Addressing this, we introduce a novel framework, Multi-modal Temporal Relation Graph Learning (MTRGL). MTRGL combines time series data and discrete features into a temporal graph and employs a memory-based temporal graph neural network. This approach reframes temporal correlation identification as a temporal graph link prediction task, which has shown empirical success. Our experiments on real-world datasets confirm the superior performance of MTRGL, emphasizing its promise in refining automated pair trading strategies.
( 2
min )
In this paper, we perform a non-asymptotic analysis of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the bias introduced by local training with heterogeneous agents, and investigate the sample complexity of the algorithm. We show that the communication complexity of FedLSA scales polynomially with the desired precision $\epsilon$, which limits the benefits of federation. To overcome this, we propose SCAFFLSA, a novel variant of FedLSA, that uses control variates to correct the bias of local training, and prove its convergence without assumptions on statistical heterogeneity. We apply the proposed methodology to federated temporal difference learning with linear function approximation, and analyze the corresponding complexity improvements.
( 2
min )
Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced data sets. Asymptotically, we prove that SMOTE (with default parameter) regenerates the original distribution by simply copying the original minority samples. We also prove that SMOTE density vanishes near the boundary of the support of the minority distribution, therefore justifying the common BorderLine SMOTE strategy. Then we introduce two new SMOTE-related strategies, and compare them with state-of-the-art rebalancing procedures. We show that rebalancing strategies are only required when the data set is highly imbalanced. For such data sets, SMOTE, our proposals, or undersampling procedures are the best strategies.
( 2
min )
This paper explores the integration of Explainable Automated Machine Learning (AutoML) in the realm of financial engineering, specifically focusing on its application in credit decision-making. The rapid evolution of Artificial Intelligence (AI) in finance has necessitated a balance between sophisticated algorithmic decision-making and the need for transparency in these systems. The focus is on how AutoML can streamline the development of robust machine learning models for credit scoring, while Explainable AI (XAI) methods, particularly SHapley Additive exPlanations (SHAP), provide insights into the models' decision-making processes. This study demonstrates how the combination of AutoML and XAI not only enhances the efficiency and accuracy of credit decisions but also fosters trust and collaboration between humans and AI systems. The findings underscore the potential of explainable AutoML in improving the transparency and accountability of AI-driven financial decisions, aligning with regulatory requirements and ethical considerations.
( 2
min )
This paper introduces a novel decision-making framework that promotes consistency among decisions made by diverse models while utilizing external knowledge. Leveraging the Integer Linear Programming (ILP) framework, we map predictions from various models into globally normalized and comparable values by incorporating information about decisions' prior probability, confidence (uncertainty), and the models' expected accuracy. Our empirical study demonstrates the superiority of our approach over conventional baselines on multiple datasets.
( 2
min )
Density Functional Theory (DFT) accurately predicts the quantum chemical properties of molecules, but scales as $O(N_{\text{electrons}}^3)$. Sch\"utt et al. (2019) successfully approximate DFT 1000x faster with Neural Networks (NN). Arguably, the biggest problem one faces when scaling to larger molecules is the cost of DFT labels. For example, it took years to create the PCQ dataset (Nakata & Shimazaki, 2017) on which subsequent NNs are trained within a week. DFT labels molecules by minimizing energy $E(\cdot )$ as a "loss function." We bypass dataset creation by directly training NNs with $E(\cdot )$ as a loss function. For comparison, Sch\"utt et al. (2019) spent 626 hours creating a dataset on which they trained their NN for 160h, for a total of 786h; our method achieves comparable performance within 31h.
( 2
min )
Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$\Sigma$. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
( 2
min )
Gaussian processes are a widely embraced technique for regression and classification due to their good prediction accuracy, analytical tractability and built-in capabilities for uncertainty quantification. However, they suffer from the curse of dimensionality whenever the number of variables increases. This challenge is generally addressed by assuming additional structure in theproblem, the preferred options being either additivity or low intrinsic dimensionality. Our contribution for high-dimensional Gaussian process modeling is to combine them with a multi-fidelity strategy, showcasing the advantages through experiments on synthetic functions and datasets.
( 2
min )
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads, and these insights also provide natural suggestions for alternative architectures.
( 2
min )
In this paper, we perform a non-asymptotic analysis of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the bias introduced by local training with heterogeneous agents, and investigate the sample complexity of the algorithm. We show that the communication complexity of FedLSA scales polynomially with the desired precision $\epsilon$, which limits the benefits of federation. To overcome this, we propose SCAFFLSA, a novel variant of FedLSA, that uses control variates to correct the bias of local training, and prove its convergence without assumptions on statistical heterogeneity. We apply the proposed methodology to federated temporal difference learning with linear function approximation, and analyze the corresponding complexity improvements.
( 2
min )
Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced data sets. Asymptotically, we prove that SMOTE (with default parameter) regenerates the original distribution by simply copying the original minority samples. We also prove that SMOTE density vanishes near the boundary of the support of the minority distribution, therefore justifying the common BorderLine SMOTE strategy. Then we introduce two new SMOTE-related strategies, and compare them with state-of-the-art rebalancing procedures. We show that rebalancing strategies are only required when the data set is highly imbalanced. For such data sets, SMOTE, our proposals, or undersampling procedures are the best strategies.
( 2
min )
In the first post of this three-part series, we presented a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case. In the second post, we discussed an approach to develop a deep learning-based computer vision model […]
( 10
min )
The emergence of large language models (LLMs) has revolutionized the way people create text and interact with computing. However, these models are limited in ensuring the accuracy of the content they generate and enforcing strict compliance with specific formats, such as JSON and other computer programming languages. Additionally, LLMs that process information from multiple sources […]
The post AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM appeared first on Microsoft Research.
( 11
min )
Research Focus: New Research Forum series explores bold ideas in the era of AI; LASER improves reasoning in language models; Cache-Efficient Top-k Aggregation over High Cardinality Large Datasets; Six Microsoft researchers named 2023 ACM Fellows.
The post Research Focus: Week of February 5, 2024 appeared first on Microsoft Research.
( 10
min )
With advances in computing, sophisticated AI models and machine learning are having a profound impact on business and society. Industries can use AI to quickly analyze vast bodies of data, allowing them to derive meaningful insights, make predictions and automate processes for greater efficiency. In the public sector, government agencies are achieving superior disaster preparedness.
Read Article
( 14
min )
The Koopman operator serves as the theoretical backbone for machine learning of dynamical control systems, where the operator is heuristically approximated by extended dynamic mode decomposition (EDMD). In this paper, we propose Stability- and certificate-oriented EDMD (SafEDMD): a novel EDMD-based learning architecture which comes along with rigorous certificates, resulting in a reliable surrogate model generated in a data-driven fashion. To ensure trustworthiness of SafEDMD, we derive proportional error bounds, which vanish at the origin and are tailored for control tasks, leading to certified controller design based on semi-definite programming. We illustrate the developed machinery by means of several benchmark examples and highlight the advantages over state-of-the-art methods.
( 2
min )
This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these "attentionless Transformers" to rival the performance of the original architecture. Through rigorous ablation studies, and experimenting with various replacement network types and sizes, we offer insights that support the viability of our approach. This not only sheds light on the adaptability of shallow feed-forward networks in emulating attention mechanisms but also underscores their potential to streamline complex architectures for sequence-to-sequence tasks.
( 2
min )
We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves $\mathcal{O}({\epsilon^{-2}})$ sample complexity and $\mathcal{O}(\epsilon^{-1})$ iteration complexity with general parameterization where $\epsilon$ defines the optimality error. This improves the state-of-the-art sample complexity by a $\log(\frac{1}{\epsilon})$ factor. ANPG is a first-order algorithm and unlike some existing literature, does not require the unverifiable assumption that the variance of importance sampling (IS) weights is upper bounded. In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$ and simultaneously matches their state-of-the-art iteration complexity.
( 2
min )
Studying the complex interactions between different brain regions is crucial in neuroscience. Various statistical methods have explored the latent communication across multiple brain regions. Two main categories are the Gaussian Process (GP) and Linear Dynamical System (LDS), each with unique strengths. The GP-based approach effectively discovers latent variables such as frequency bands and communication directions. Conversely, the LDS-based approach is computationally efficient but lacks powerful expressiveness in latent representation. In this study, we merge both methodologies by creating an LDS mirroring a multi-output GP, termed Multi-Region Markovian Gaussian Process (MRM-GP). Our work is the first to establish a connection between an LDS and a multi-output GP that explicitly models frequencies and phase delays within the latent space of neural recordings. Consequently, the model achieves a linear inference cost over time points and provides an interpretable low-dimensional representation, revealing communication directions across brain regions and separating oscillatory communications into different frequency bands.
( 2
min )
Transformer-based models still face the structural limitation of fixed context length in processing long sequence input despite their effectiveness in various fields. While various external memory techniques were introduced, most previous techniques fail to avoid fateful forgetting, where even the most important memories are inevitably forgotten after a sufficient number of time steps. We designed Memoria, a memory system for artificial neural networks, drawing inspiration from humans and applying various neuroscientific and psychological theories related to memory. Experimentally, we demonstrated the effectiveness of Memoria in tasks such as sorting and language modeling, surpassing conventional techniques.
( 2
min )
Deep reinforcement learning (DRL) has significantly advanced the field of combinatorial optimization (CO). However, its practicality is hindered by the necessity for a large number of reward evaluations, especially in scenarios involving computationally intensive function assessments. To enhance the sample efficiency, we propose a simple but effective method, called symmetric replay training (SRT), which can be easily integrated into various DRL methods. Our method leverages high-reward samples to encourage exploration of the under-explored symmetric regions without additional online interactions - free. Through replay training, the policy is trained to maximize the likelihood of the symmetric trajectories of discovered high-rewarded samples. Experimental results demonstrate the consistent improvement of our method in sample efficiency across diverse DRL methods applied to real-world tasks, such as molecular optimization and hardware design.
( 2
min )
In this paper, we clarify the crucial difference between a deep neural network and the Fourier series. For the multiple Fourier series of periodization of some radial functions on $\mathbb{R}^d$, Kuratsubo (2010) investigated the behavior of the spherical partial sum and discovered the third phenomenon other than the well-known Gibbs-Wilbraham and Pinsky phenomena. In particular, the third one exhibits prevention of pointwise convergence. In contrast to it, we give a specific deep neural network and prove pointwise convergence.
( 2
min )
Consensus control in multi-agent systems has received significant attention and practical implementation across various domains. However, managing consensus control under unknown dynamics remains a significant challenge for control design due to system uncertainties and environmental disturbances. This paper presents a novel learning-based distributed control law, augmented by an auxiliary dynamics. Gaussian processes are harnessed to compensate for the unknown components of the multi-agent system. For continuous enhancement in predictive performance of Gaussian process model, a data-efficient online learning strategy with a decentralized event-triggered mechanism is proposed. Furthermore, the control performance of the proposed approach is ensured via the Lyapunov theory, based on a probabilistic guarantee for prediction error bounds. To demonstrate the efficacy of the proposed learning-based controller, a comparative analysis is conducted, contrasting it with both conventional distributed control laws and offline learning methodologies.
( 2
min )
We examine the impact of homograph attacks on the Sentiment Analysis (SA) task of different Arabic dialects from the Maghreb North-African countries. Homograph attacks result in a 65.3% decrease in transformer classification from an F1-score of 0.95 to 0.33 when data is written in "Arabizi". The goal of this study is to highlight LLMs weaknesses' and to prioritize ethical and responsible Machine Learning.
( 2
min )
This paper introduces DogSurf - a newapproach of using quadruped robots to help visually impaired people navigate in real world. The presented method allows the quadruped robot to detect slippery surfaces, and to use audio and haptic feedback to inform the user when to stop. A state-of-the-art GRU-based neural network architecture with mean accuracy of 99.925% was proposed for the task of multiclass surface classification for quadruped robots. A dataset was collected on a Unitree Go1 Edu robot. The dataset and code have been posted to the public domain.
( 2
min )
This work presents an innovative learning-based approach to tackle the tracking control problem of Euler-Lagrange multi-agent systems with partially unknown dynamics operating under switching communication topologies. The approach leverages a correlation-aware cooperative algorithm framework built upon Gaussian process regression, which adeptly captures inter-agent correlations for uncertainty predictions. A standout feature is its exceptional efficiency in deriving the aggregation weights achieved by circumventing the computationally intensive posterior variance calculations. Through Lyapunov stability analysis, the distributed control law ensures bounded tracking errors with high probability. Simulation experiments validate the protocol's efficacy in effectively managing complex scenarios, establishing it as a promising solution for robust tracking control in multi-agent systems characterized by uncertain dynamics and dynamic communication structures.
( 2
min )
This literature review gives an overview of current approaches to perform domain adaptation in a low-resource and approaches to perform multilingual semantic search in a low-resource setting. We developed a new typology to cluster domain adaptation approaches based on the part of dense textual information retrieval systems, which they adapt, focusing on how to combine them efficiently. We also explore the possibilities of combining multilingual semantic search with domain adaptation approaches for dense retrievers in a low-resource setting.
( 2
min )
Adversarial Malware Generation (AMG), the generation of adversarial malware variants to strengthen Deep Learning (DL)-based malware detectors has emerged as a crucial tool in the development of proactive cyberdefense. However, the majority of extant works offer subtle perturbations or additions to executable files and do not explore full-file obfuscation. In this study, we show that an open-source encryption tool coupled with a Reinforcement Learning (RL) framework can successfully obfuscate malware to evade state-of-the-art malware detection engines and outperform techniques that use advanced modification methods. Our results show that the proposed method improves the evasion rate from 27%-49% compared to widely-used state-of-the-art reinforcement learning-based methods.
( 2
min )
Applied recommender systems research is in a curious position. While there is a very rigorous protocol for measuring performance by A/B testing, best practice for finding a `B' to test does not explicitly target performance but rather targets a proxy measure. The success or failure of a given A/B test then depends entirely on if the proposed proxy is better correlated to performance than the previous proxy. No principle exists to identify if one proxy is better than another offline, leaving the practitioners shooting in the dark. The purpose of this position paper is to question this anti-Utopian thinking and argue that a non-standard use of the deep learning stacks actually has the potential to unlock reward optimizing recommendation.
( 2
min )
With the increasing use of big data and business analytics, data storytelling has gained popularity as an effective means of communicating analytical insights to audiences to support decision making and improve business performance. However, there is little empirical evidence on the impact of data storytelling on data understanding. This study validates the concept of data storytelling as a construct in terms of its impact on users' data understanding. Based on empirical data analysis, the results of this study show that data storytelling competence is positively associated with organizational performance, which is partly due to the quality of the decision is conveyed. These results provide a theoretical basis for further investigation of potential antecedents and consequences of data storytelling.
( 2
min )
In digital health, the strategy of allocating a limited treatment budget across available risk times is crucial to reduce user fatigue. This strategy, however, encounters a significant obstacle due to the unknown actual number of risk times, a factor not adequately addressed by existing methods lacking theoretical guarantees. This paper introduces, for the first time, the online uniform risk times sampling problem within the approximation algorithm framework. We propose two online approximation algorithms for this problem, one with and one without learning augmentation, and provide rigorous theoretical performance guarantees for them using competitive ratio analysis. We assess the performance of our algorithms using both synthetic experiments and a real-world case study on HeartSteps mobile applications.
( 2
min )
A methodology that seeks to enhance model prediction performance is presented. The method involves generating multiple auxiliary models that capture relationships between attributes as a function of each other. Such information serves to generate additional informative columns in the dataset that can potentially enhance target prediction. A proof of case and related code is provided.
( 2
min )
The paper presents the exact formula for the vector field that minimizes the loss for the standard flow. This formula depends analytically on a given distribution \rho_0 and an unknown one \rho_1. Based on the presented formula, a new loss and algorithm for training a vector field model in the style of Conditional Flow Matching are provided. Our loss, in comparison to the standard Conditional Flow Matching approach, exhibits smaller variance when evaluated through Monte Carlo sampling methods. Numerical experiments on synthetic models and models on tabular data of large dimensions demonstrate better learning results with the use of the presented algorithm.
( 2
min )
Infrared (IR) spectroscopy is a pivotal technique in chemical research for elucidating molecular structures and dynamics through vibrational and rotational transitions. However, the intricate molecular fingerprints characterized by unique vibrational and rotational patterns present substantial analytical challenges. Here, we present a machine learning approach employing a Structural Attention Mechanism tailored to enhance the prediction and interpretation of infrared spectra, particularly for diazo compounds. Our model distinguishes itself by honing in on chemical information proximal to functional groups, thereby significantly bolstering the accuracy, robustness, and interpretability of spectral predictions. This method not only demystifies the correlations between infrared spectral features and molecular structures but also offers a scalable and efficient paradigm for dissecting complex molecular interactions.
( 2
min )
This paper investigates the impact of multiscale data on machine learning algorithms, particularly in the context of deep learning. A dataset is multiscale if its distribution shows large variations in scale across different directions. This paper reveals multiscale structures in the loss landscape, including its gradients and Hessians inherited from the data. Correspondingly, it introduces a novel gradient descent approach, drawing inspiration from multiscale algorithms used in scientific computing. This approach seeks to transcend empirical learning rate selection, offering a more systematic, data-informed strategy to enhance training efficiency, especially in the later stages.
( 2
min )
Despite deep learning's widespread success, its data-hungry and computationally expensive nature makes it impractical for many data-constrained real-world applications. Few-Shot Learning (FSL) aims to address these limitations by enabling rapid adaptation to novel learning tasks, seeing significant growth in recent years. This survey provides a comprehensive overview of the field's latest advancements. Initially, FSL is formally defined, and its relationship with different learning fields is presented. A novel taxonomy is introduced, extending previously proposed ones, and real-world applications in classic and novel fields are described. Finally, recent trends shaping the field, outstanding challenges, and promising future research directions are discussed.
( 2
min )
We develop a new method HTBB for the multidimensional black-box approximation and gradient-free optimization, which is based on the low-rank hierarchical Tucker decomposition with the use of the MaxVol indices selection procedure. Numerical experiments for 14 complex model problems demonstrate the robustness of the proposed method for dimensions up to 1000, while it shows significantly more accurate results than classical gradient-free optimization methods, as well as approximation and optimization methods based on the popular tensor train decomposition, which represents a simpler case of a tensor network.
( 2
min )
We consider the problem of real-time reconstruction of urban air pollution maps. The task is challenging due to the heterogeneous sources of available data, the scarcity of direct measurements, the presence of noise, and the large surfaces that need to be considered. In this work, we introduce different reconstruction methods based on posing the problem on city graphs. Our strategies can be classified as fully data-driven, physics-driven, or hybrid, and we combine them with super-learning models. The performance of the methods is tested in the case of the inner city of Paris, France.
( 2
min )
This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on the mean squared error and give guarantees on the ability of LG-GNN to detect high-probability edges. Our guarantees hold for both sparse and dense graphs. Finally, we demonstrate some of the shortcomings of the classical GCN architecture, as well as verify our results on real and synthetic datasets.
( 2
min )
Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness.
( 2
min )
Most existing federated learning (FL) methodologies have assumed training begins from a randomly initialized model. Recently, several studies have empirically demonstrated that leveraging a pre-trained model can offer advantageous initializations for FL. In this paper, we propose a collaborative pre-training approach, CoPreFL, which strategically designs a pre-trained model to serve as a good initialization for any downstream FL task. The key idea of our pre-training algorithm is a meta-learning procedure which mimics downstream distributed scenarios, enabling it to adapt to any unforeseen FL task. CoPreFL's pre-training optimization procedure also strikes a balance between average performance and fairness, with the aim of addressing these competing challenges in downstream FL tasks through intelligent initializations. Extensive experimental results validate that our pre-training method provides a robust initialization for any unseen downstream FL task, resulting in enhanced average performance and more equitable predictions.
( 2
min )
This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDP). To the best of our knowledge, this work is the first to delve into the regret and constraint violation analysis of average reward CMDPs with a general policy parametrization. To address this challenge, we propose a primal dual based policy gradient algorithm that adeptly manages the constraints while ensuring a low regret guarantee toward achieving a global optimal policy. In particular, we demonstrate that our proposed algorithm achieves $\tilde{\mathcal{O}}({T}^{3/4})$ objective regret and $\tilde{\mathcal{O}}({T}^{3/4})$ constraint violation bounds.
( 2
min )
Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness.
( 2
min )
Applied recommender systems research is in a curious position. While there is a very rigorous protocol for measuring performance by A/B testing, best practice for finding a `B' to test does not explicitly target performance but rather targets a proxy measure. The success or failure of a given A/B test then depends entirely on if the proposed proxy is better correlated to performance than the previous proxy. No principle exists to identify if one proxy is better than another offline, leaving the practitioners shooting in the dark. The purpose of this position paper is to question this anti-Utopian thinking and argue that a non-standard use of the deep learning stacks actually has the potential to unlock reward optimizing recommendation.
( 2
min )
This paper derives statistical guarantees for the performance of Graph Neural Networks (GNNs) in link prediction tasks on graphs generated by a graphon. We propose a linear GNN architecture (LG-GNN) that produces consistent estimators for the underlying edge probabilities. We establish a bound on the mean squared error and give guarantees on the ability of LG-GNN to detect high-probability edges. Our guarantees hold for both sparse and dense graphs. Finally, we demonstrate some of the shortcomings of the classical GCN architecture, as well as verify our results on real and synthetic datasets.
( 2
min )
This post is co-written with Ilan Geller, Shuyu Yang and Richa Gupta from Accenture. Bringing innovative new pharmaceuticals drugs to market is a long and stringent process. Companies face complex regulations and extensive approval requirements from governing bodies like the US Food and Drug Administration (FDA). A key part of the submission process is authoring […]
( 7
min )
Do your employees wait for hours on the telephone to open an IT ticket? Do they wait for an agent to triage an issue, which sometimes only requires restarting the computer? Providing excellent IT support is crucial for any organization, but legacy systems have relied heavily on human agents being available to intake reports and […]
( 13
min )
In this post, we show how to develop an ML-driven solution using Amazon SageMaker for detecting adverse events using the publicly available Adverse Drug Reaction Dataset on Hugging Face. In this solution, we fine-tune a variety of models on Hugging Face that were pre-trained on medical data and use the BioBERT model, which was pre-trained on the Pubmed dataset and performs the best out of those tried.
( 8
min )
The graduate students will aim to commercialize innovations in AI, machine learning, and data science.
( 6
min )
Mr_Vudoo is a digital renaissance man — a livestreamer, video editor, gamer and entertainer skilled in producing an array of content for his audience.
( 9
min )
Recently, there has been a growing interest in mixed-categorical metamodels based on Gaussian Process (GP) for Bayesian optimization. In this context, different approaches can be used to build the mixed-categorical GP. Many of these approaches involve a high number of hyperparameters; in fact, the more general and precise the strategy used to build the GP, the greater the number of hyperparameters to estimate. This paper introduces an innovative dimension reduction algorithm that relies on partial least squares regression to reduce the number of hyperparameters used to build a mixed-variable GP. Our goal is to generalize classical dimension reduction techniques commonly used within GP (for continuous inputs) to handle mixed-categorical inputs. The good potential of the proposed method is demonstrated in both structural and multidisciplinary application contexts. The targeted applications include the analysis of a cantilever beam as well as the optimization of a green aircraft, resulting in a significant 439-kilogram reduction in fuel consumption during a single mission.
( 2
min )
It is well established that to ensure or certify the robustness of a neural network, its Lipschitz constant plays a prominent role. However, its calculation is NP-hard. In this note, by taking into account activation regions at each layer as new constraints, we propose new quadratically constrained MIP formulations for the neural network Lipschitz estimation problem. The solutions of these problems give lower bounds and upper bounds of the Lipschitz constant and we detail conditions when they coincide with the exact Lipschitz constant.
( 2
min )
Compositional generalization is one of the main properties which differentiates lexical learning in humans from state-of-art neural networks. We propose a general framework for building models that can generalize compositionally using the concept of Generalized Grammar Rules (GGRs), a class of symmetry-based compositional constraints for transduction tasks, which we view as a transduction analogue of equivariance constraints in physics-inspired tasks. Besides formalizing generalized notions of symmetry for language transduction, our framework is general enough to contain many existing works as special cases. We present ideas on how GGRs might be implemented, and in the process draw connections to reinforcement learning and other areas of research.
( 2
min )
A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a Bayesian deep ensemble approach as an effective solution with competitive performance and uncertainty quantification.
( 2
min )
In this work, we propose a model-agnostic instance-based post-hoc explainability method for time series classification. The proposed algorithm, namely Time-CF, leverages shapelets and TimeGAN to provide counterfactual explanations for arbitrary time series classifiers. We validate the proposed method on several real-world univariate time series classification tasks from the UCR Time Series Archive. The results indicate that the counterfactual instances generated by Time-CF when compared to state-of-the-art methods, demonstrate better performance in terms of four explainability metrics: closeness, sensibility, plausibility, and sparsity.
( 2
min )
Compositional generalization is one of the main properties which differentiates lexical learning in humans from state-of-art neural networks. We propose a general framework for building models that can generalize compositionally using the concept of Generalized Grammar Rules (GGRs), a class of symmetry-based compositional constraints for transduction tasks, which we view as a transduction analogue of equivariance constraints in physics-inspired tasks. Besides formalizing generalized notions of symmetry for language transduction, our framework is general enough to contain many existing works as special cases. We present ideas on how GGRs might be implemented, and in the process draw connections to reinforcement learning and other areas of research.
( 2
min )
For autonomous mobile robots, uncertainties in the environment and system model can lead to failure in the motion planning pipeline, resulting in potential collisions. In order to achieve a high level of robust autonomy, these robots should be able to proactively predict and recover from such failures. To this end, we propose a Gaussian Process (GP) based model for proactively detecting the risk of future motion planning failure. When this risk exceeds a certain threshold, a recovery behavior is triggered that leverages the same GP model to find a safe state from which the robot may continue towards the goal. The proposed approach is trained in simulation only and can generalize to real world environments on different robotic platforms. Simulations and physical experiments demonstrate that our framework is capable of both predicting planner failures and recovering the robot to states where planner success is likely, all while producing agile motion.
( 2
min )
In molecular dynamics (MD) simulations, rare events, such as protein folding, are typically studied by means of enhanced sampling techniques, most of which rely on the definition of a collective variable (CV) along which the acceleration occurs. Obtaining an expressive CV is crucial, but often hindered by the lack of information about the particular event, e.g., the transition from unfolded to folded conformation. We propose a simulation-free data augmentation strategy using physics-inspired metrics to generate geodesic interpolations resembling protein folding transitions, thereby improving sampling efficiency without true transition state samples. Leveraging interpolation progress parameters, we introduce a regression-based learning scheme for CV models, which outperforms classifier-based methods when transition state data is limited and noisy
( 2
min )
We seek to enable classic processing of continuous ultra-sparse spatiotemporal data generated by event-based sensors with dense machine learning models. We propose a novel hybrid pipeline composed of asynchronous sensing and synchronous processing that combines several ideas: (1) an embedding based on PointNet models -- the ALERT module -- that can continuously integrate new and dismiss old events thanks to a leakage mechanism, (2) a flexible readout of the embedded data that allows to feed any downstream model with always up-to-date features at any sampling rate, (3) exploiting the input sparsity in a patch-based approach inspired by Vision Transformer to optimize the efficiency of the method. These embeddings are then processed by a transformer model trained for object and gesture recognition. Using this approach, we achieve performances at the state-of-the-art with a lower latency than competitors. We also demonstrate that our asynchronous model can operate at any desired sampling rate.
( 2
min )
Recent work has described the presence of the embedding gap in neural network verification. On one side of the gap is a high-level specification about the network's behaviour, written by a domain expert in terms of the interpretable problem space. On the other side are a logically-equivalent set of satisfiability queries, expressed in the uninterpretable embedding space in a form suitable for neural network solvers. In this paper we describe an algorithm for compiling the former to the latter. We explore and overcome complications that arise from targeting neural network solvers as opposed to standard SMT solvers.
( 2
min )
Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc. However, their convergence analysis under non-convex conditions is challenging. In this work, we propose a unified framework to address this issue. For any first-order methods, we interpret the updated direction $g_t$ as the sum of the stochastic subgradient $\nabla f_t(x_t)$ and an additional acceleration term $\frac{2|\langle v_t, \nabla f_t(x_t) \rangle|}{\|v_t\|_2^2} v_t$, thus we can discuss the convergence by analyzing $\langle v_t, \nabla f_t(x_t) \rangle$. Through our framework, we have discovered two plug-and-play acceleration methods: \textbf{Reject Accelerating} and \textbf{Random Vector Accelerating}, we theoretically demonstrate that these two methods can directly lead to an improvement in convergence rate.
( 2
min )
A major challenge in sample-based inference (SBI) for Bayesian neural networks is the size and structure of the networks' parameter space. Our work shows that successful SBI is possible by embracing the characteristic relationship between weight and function space, uncovering a systematic link between overparameterization and the difficulty of the sampling problem. Through extensive experiments, we establish practical guidelines for sampling and convergence diagnosis. As a result, we present a Bayesian deep ensemble approach as an effective solution with competitive performance and uncertainty quantification.
( 2
min )
Dose-Volume Histogram (DVH) prediction is fundamental in radiation therapy that facilitate treatment planning, dose evaluation, plan comparison and etc. It helps to increase the ability to deliver precise and effective radiation treatments while managing potential toxicities to healthy tissues as needed to reduce the risk of complications. This paper extends recently disclosed research findings presented on AAPM (AAPM 65th Annual Meeting $\&$ Exhibition) and includes necessary technique details. The objective is to design efficient deep learning models for DVH prediction on general radiotherapy platform equipped with high performance CBCT system, where input CT images and target dose images to predict may have different origins, spacing and sizes. Deep learning models widely-adopted in DVH prediction task are evaluated on the novel radiotherapy platform, and graph neural networks (GNNs) are shown to be the ideal architecture to construct a plug-and-play framework to improve predictive performance of base deep learning models in the adaptive setting.
( 2
min )
We make the case for neural network objects and extend an already existing neural network calculus explained in detail in Chapter 2 on \cite{bigbook}. Our aim will be to show that, yes, indeed, it makes sense to talk about neural network polynomials, neural network exponentials, sine, and cosines in the sense that they do indeed approximate their real number counterparts subject to limitations on certain of their parameters, $q$, and $\varepsilon$. While doing this, we show that the parameter and depth growth are only polynomial on their desired accuracy (defined as a 1-norm difference over $\mathbb{R}$), thereby showing that this approach to approximating, where a neural network in some sense has the structural properties of the function it is approximating is not entire intractable.
( 2
min )
This study introduces a two-scale Graph Neural Operator (GNO), namely, LatticeGraphNet (LGN), designed as a surrogate model for costly nonlinear finite-element simulations of three-dimensional latticed parts and structures. LGN has two networks: LGN-i, learning the reduced dynamics of lattices, and LGN-ii, learning the mapping from the reduced representation onto the tetrahedral mesh. LGN can predict deformation for arbitrary lattices, therefore the name operator. Our approach significantly reduces inference time while maintaining high accuracy for unseen simulations, establishing the use of GNOs as efficient surrogate models for evaluating mechanical responses of lattices and structures.
( 2
min )
Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service for building and deploying machine learning (ML) models without the need to write any code. Ready-to-use Foundation Models (FMs) available in SageMaker Canvas enable customers to use generative AI for tasks such as content generation and summarization. We are thrilled to announce the latest […]
( 6
min )
This is a guest post co-authored by Ajay K Gupta, Jean Felipe Teotonio and Paul A Churchyard from HSR.health. HSR.health is a geospatial health risk analytics firm whose vision is that global health challenges are solvable through human ingenuity and the focused and accurate application of data analytics. In this post, we present one approach […]
( 13
min )
AI is reshaping industries, society and the “very fabric of innovation” — and Canada is poised to play a key role in this global transformation, said NVIDIA founder and CEO Jensen Huang during a fireside chat with leaders from across Canada’s thriving AI ecosystem. “Canada, as you know, even though you’re so humble, you might
Read Article
( 6
min )
A new study underscores the potential of AI and accelerated computing to deliver energy efficiency and combat climate change, efforts in which NVIDIA has long been deeply engaged. The study, called “Rethinking Concerns About AI’s Energy Use,” provides a well-researched examination into how AI can — and in many cases already does — play a
Read Article
( 7
min )
Image by Cathrin2014 from Pixabay In July 2023, Teresa Tung, managing director and cloud-first chief technologist at Accenture, gave a Factory of the Future talk at the Databricks Data + AI Summit on digital twins, knowledge graphs, and generative AI for warehouse automation. Two points she made that resonated with me: 1) Digital twins are… Read More »Digital twins, interoperability and FAIR model-driven development
The post Digital twins, interoperability and FAIR model-driven development appeared first on Data Science Central.
( 22
min )
Let’s dive into the cloud, but not just any cloud—the cloud of the future, specifically the realm of cloud security in 2024. We’re not just talking about your everyday, run-of-the-mill updates here. We’re looking at the big players, the game changers, the trends that are going to set the stage for how we protect our… Read More »5 trends & advances that are set to define cloud security in 2024
The post 5 trends & advances that are set to define cloud security in 2024 appeared first on Data Science Central.
( 21
min )
Exploiting the symmetry within datasets, MIT researchers show, can decrease the amount of data needed for training neural networks.
( 7
min )
Dermatologists and general practitioners are somewhat less accurate in diagnosing disease in darker skin, a new study finds. Used correctly, AI may be able to help.
( 7
min )
AI Weirdness: the strange side of machine learning
( 2
min )
One of the most useful application patterns for generative AI workloads is Retrieval Augmented Generation (RAG). In the RAG pattern, we find pieces of reference content related to an input prompt by performing similarity searches on embeddings. Embeddings capture the information content in bodies of text, allowing natural language processing (NLP) models to work with […]
( 18
min )
In this paper, we prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence. The result applies to a wide range of known function classes. In particular, while most previous works impose explicit smoothness assumptions on the regression function, our framework encompasses more general settings. The proposed neural networks are either the minimizers of the logistic loss or the $0$-$1$ loss. In the former case, they are interpolating classifiers that exhibit a benign overfitting behavior.
( 2
min )
Understanding Origin-Destination (O-D) travel demand is crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which challenge the Gaussian assumption inherent to deterministic deep learning models. To address these challenges, we propose a novel approach: the Spatial-Temporal Tweedie Graph Neural Network (STTD). The STTD introduces the Tweedie distribution as a compelling alternative to the traditional 'zero-inflated' model and leverages spatial and temporal embeddings to parameterize travel demand distributions. Our evaluations using real-world datasets highlight STTD's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.
( 2
min )
We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a weak convergence assumption on the sampled dataset, we examine the behavior of points generated by t-SNE under continuous gradient flow. Demonstrating that points generated by t-SNE remain bounded, we leverage this insight to establish the existence of a minimizer for KL divergence.
( 2
min )
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the first non-asymptotic convergence analysis for a general class of probability flow ODE samplers in 2-Wasserstein distance, assuming accurate score estimates. We then consider various examples and establish results on the iteration complexity of the corresponding ODE-based samplers.
( 2
min )
In this paper, we prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence. The result applies to a wide range of known function classes. In particular, while most previous works impose explicit smoothness assumptions on the regression function, our framework encompasses more general settings. The proposed neural networks are either the minimizers of the logistic loss or the $0$-$1$ loss. In the former case, they are interpolating classifiers that exhibit a benign overfitting behavior.
( 2
min )
We present a novel approach for differentially private data synthesis of protected tabular datasets, a relevant task in highly sensitive domains such as healthcare and government. Current state-of-the-art methods predominantly use marginal-based approaches, where a dataset is generated from private estimates of the marginals. In this paper, we introduce PrivPGD, a new generation method for marginal-based private data synthesis, leveraging tools from optimal transport and particle gradient descent. Our algorithm outperforms existing methods on a large range of datasets while being highly scalable and offering the flexibility to incorporate additional domain-specific constraints.
( 2
min )
Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and employed in practice, the theoretical understandings about convergence properties of the probability flow ODE are still quite limited. In this paper, we provide the first non-asymptotic convergence analysis for a general class of probability flow ODE samplers in 2-Wasserstein distance, assuming accurate score estimates. We then consider various examples and establish results on the iteration complexity of the corresponding ODE-based samplers.
( 2
min )
Neutrinos can undergo fast flavor conversions (FFCs) within extremely dense astrophysical environments such as core-collapse supernovae (CCSNe) and neutron star mergers (NSMs). In this study, we explore FFCs in a \emph{multi-energy} neutrino gas, revealing that when the FFC growth rate significantly exceeds that of the vacuum Hamiltonian, all neutrinos (regardless of energy) share a common survival probability dictated by the energy-integrated neutrino spectrum. We then employ physics-informed neural networks (PINNs) to predict the asymptotic outcomes of FFCs within such a multi-energy neutrino gas. These predictions are based on the first two moments of neutrino angular distributions for each energy bin, typically available in state-of-the-art CCSN and NSM simulations. Our PINNs achieve errors as low as $\lesssim6\%$ and $\lesssim 18\%$ for predicting the number of neutrinos in the electron channel and the relative absolute error in the neutrino moments, respectively.
( 2
min )
Understanding Origin-Destination (O-D) travel demand is crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which challenge the Gaussian assumption inherent to deterministic deep learning models. To address these challenges, we propose a novel approach: the Spatial-Temporal Tweedie Graph Neural Network (STTD). The STTD introduces the Tweedie distribution as a compelling alternative to the traditional 'zero-inflated' model and leverages spatial and temporal embeddings to parameterize travel demand distributions. Our evaluations using real-world datasets highlight STTD's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.
( 2
min )
We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a weak convergence assumption on the sampled dataset, we examine the behavior of points generated by t-SNE under continuous gradient flow. Demonstrating that points generated by t-SNE remain bounded, we leverage this insight to establish the existence of a minimizer for KL divergence.
( 2
min )
Resilience plays a pivotal role in the development of any workload, and generative AI workloads are no different. There are unique considerations when engineering generative AI workloads through a resilience lens. Understanding and prioritizing resilience is crucial for generative AI workloads to meet organizational availability and business continuity requirements. In this post, we discuss the […]
( 8
min )
Data is the foundation to capturing the maximum value from AI technology and solving business problems quickly. To unlock the potential of generative AI technologies, however, there’s a key prerequisite: your data needs to be appropriately prepared. In this post, we describe how use generative AI to update and scale your data pipeline using Amazon […]
( 6
min )
Embeddings play a key role in natural language processing (NLP) and machine learning (ML). Text embedding refers to the process of transforming text into numerical representations that reside in a high-dimensional vector space. This technique is achieved through the use of ML algorithms that enable the understanding of the meaning and context of data (semantic […]
( 9
min )
Image by Ahmad Ardity from Pixabay The good news is that the data science community is taking more of an interest in knowledge graphs lately. But unsurprisingly, some data science folks exploring graphs themselves are barely scratching the surface of knowledge graph potential. Until data scientists view the root problem to be solved through the… Read More »What data scientists overlook when it comes to knowledge graphs
The post What data scientists overlook when it comes to knowledge graphs appeared first on Data Science Central.
( 22
min )
GeForce NOW is celebrating its fourth anniversary all month — plus an extra day for leap year — during February’s GFN Thursdays, with 2 new games joining the cloud. Keep an eye out for more new games and other announcements for members to come. Diablo IV and Overwatch 2 heat up the cloud this GFN Read article >
( 7
min )
This study investigates self-supervised learning techniques to obtain representations of Event Sequences. It is a key modality in various applications, including but not limited to banking, e-commerce, and healthcare.
We perform a comprehensive study of generative and contrastive approaches in self-supervised learning, applying them both independently. We find that there is no single supreme method. Consequently, we explore the potential benefits of combining these approaches. To achieve this goal, we introduce a novel method that aligns generative and contrastive embeddings as distinct modalities, drawing inspiration from contemporary multimodal research.
Generative and contrastive approaches are often treated as mutually exclusive, leaving a gap for their combined exploration. Our results demonstrate that this aligned model performs at least on par with, and mostly surpasses, existing methods and is more universal across a variety of tasks. Furthermore, we demonstrate that self-supervised methods consistently outperform the supervised approach on our datasets.
( 2
min )
There are now many explainable AI methods for understanding the decisions of a machine learning model. Among these are those based on counterfactual reasoning, which involve simulating features changes and observing the impact on the prediction. This article proposes to view this simulation process as a source of creating a certain amount of knowledge that can be stored to be used, later, in different ways. This process is illustrated in the additive model and, more specifically, in the case of the naive Bayes classifier, whose interesting properties for this purpose are shown.
( 2
min )
This paper explores optimal service resource management strategy, a continuous challenge for health information service to enhance service performance, optimise service resource utilisation and deliver interactive health information service. An adaptive optimal service resource management strategy was developed considering a value co-creation model in health information service with a focus on collaborative and interactive with users. The deep reinforcement learning algorithm was embedded in the Internet of Things (IoT)-based health information service system (I-HISS) to allocate service resources by controlling service provision and service adaptation based on user engagement behaviour. The simulation experiments were conducted to evaluate the significance of the proposed algorithm under different user reactions to the health information service.
( 2
min )
Prompt design and engineering has become an important discipline in just the past few months. In this paper, we provide an introduction to the main concepts and design approaches. We also provide more advanced techniques all the way to those needed to design LLM-based agents. We finish by providing a list of existing tools for prompt engineering.
( 2
min )
Quantum computing shows great potential, but errors pose a significant challenge. This study explores new strategies for mitigating quantum errors using artificial neural networks (ANN) and the Yang-Baxter equation (YBE). Unlike traditional error correction methods, which are computationally intensive, we investigate artificial error mitigation. The manuscript introduces the basics of quantum error sources and explores the potential of using classical computation for error mitigation. The Yang-Baxter equation plays a crucial role, allowing us to compress time dynamics simulations into constant-depth circuits. By introducing controlled noise through the YBE, we enhance the dataset for error mitigation. We train an ANN model on partial data from quantum simulations, demonstrating its effectiveness in correcting errors in time-evolving quantum states.
( 2
min )
Active learning strategies for 3D object detection in autonomous driving datasets may help to address challenges of data imbalance, redundancy, and high-dimensional data. We demonstrate the effectiveness of entropy querying to select informative samples, aiming to reduce annotation costs and improve model performance. We experiment using the BEVFusion model for 3D object detection on the nuScenes dataset, comparing active learning to random sampling and demonstrating that entropy querying outperforms in most cases. The method is particularly effective in reducing the performance gap between majority and minority classes. Class-specific analysis reveals efficient allocation of annotated resources for limited data budgets, emphasizing the importance of selecting diverse and informative data for model training. Our findings suggest that entropy querying is a promising strategy for selecting data that enhances model learning in resource-constrained environments.
( 2
min )
As legal case law databases such as HUDOC continue to grow rapidly, it has become essential for legal researchers to find efficient methods to handle such large-scale data sets. Such case law databases usually consist of the textual content of cases together with the citations between them. This paper focuses on case law from the European Court of Human Rights on Article 8 of the European Convention of Human Rights, the right to respect private and family life, home and correspondence. In this study, we demonstrate and compare the potential of topic modelling and citation network to find and organize case law on Article 8 based on their general themes and citation patterns, respectively. Additionally, we explore whether combining these two techniques leads to better results compared to the application of only one of the methods. We evaluate the effectiveness of the combined method on a unique manually collected and annotated dataset of Aricle 8 case law on evictions. The results of our experiments show that our combined (text and citation-based) approach provides the best results in finding and grouping case law, providing scholars with an effective way to extract and analyse relevant cases on a specific issue.
( 3
min )
This paper presents a comprehensive comparative analysis of explainable artificial intelligence (XAI) ensembling methods. Our research brings three significant contributions. Firstly, we introduce a novel ensembling method, NormEnsembleXAI, that leverages minimum, maximum, and average functions in conjunction with normalization techniques to enhance interpretability. Secondly, we offer insights into the strengths and weaknesses of XAI ensemble methods. Lastly, we provide a library, facilitating the practical implementation of XAI ensembling, thus promoting the adoption of transparent and interpretable deep learning models.
( 2
min )
This article proposes a test procedure that can be used to test ML models and ML-based systems independently of the actual training process. In this way, the typical quality statements such as accuracy and precision of these models and system can be verified independently, taking into account their black box character and the immanent stochastic properties of ML models and their training data. The article presents first results from a set of test experiments and suggest extensions to existing test methods reflecting the stochastic nature of ML models and ML-based systems.
( 2
min )
This paper studies Bayesian optimization with noise-free observations. We introduce new algorithms rooted in scattered data approximation that rely on a random exploration step to ensure that the fill-distance of query points decays at a near-optimal rate. Our algorithms retain the ease of implementation of the classical GP-UCB algorithm and satisfy cumulative regret bounds that nearly match those conjectured in arXiv:2002.05096, hence solving a COLT open problem. Furthermore, the new algorithms outperform GP-UCB and other popular Bayesian optimization strategies in several examples.
( 2
min )
Deep learning models have demonstrated promising results in estimating treatment effects (TEE). However, most of them overlook the variations in treatment outcomes among subgroups with distinct characteristics. This limitation hinders their ability to provide accurate estimations and treatment recommendations for specific subgroups. In this study, we introduce a novel neural network-based framework, named SubgroupTE, which incorporates subgroup identification and treatment effect estimation. SubgroupTE identifies diverse subgroups and simultaneously estimates treatment effects for each subgroup, improving the treatment effect estimation by considering the heterogeneity of treatment responses. Comparative experiments on synthetic data show that SubgroupTE outperforms existing models in treatment effect estimation. Furthermore, experiments on a real-world dataset related to opioid use disorder (OUD) demonstrate the potential of our approach to enhance personalized treatment recommendations for OUD patients.
( 2
min )
World is looking for clean and renewable energy sources that do not pollute the environment, in an attempt to reduce greenhouse gas emissions that contribute to global warming. Wind energy has significant potential to not only reduce greenhouse emission, but also meet the ever increasing demand for energy. To enable the effective utilization of wind energy, addressing the following three challenges in wind data analysis is crucial. Firstly, improving data resolution in various climate conditions to ensure an ample supply of information for assessing potential energy resources. Secondly, implementing dimensionality reduction techniques for data collected from sensors/simulations to efficiently manage and store large datasets. Thirdly, extrapolating wind data from one spatial specification to another, particularly in cases where data acquisition may be impractical or costly. We propose a deep learning based approach to achieve multi-modal continuous resolution wind data prediction from discontinuous wind data, along with data dimensionality reduction.
( 2
min )
Activity detection is an important task in the next generation grant-free multiple access. While there are a number of existing algorithms designed for this purpose, they mostly require precise information about the network, such as large-scale fading coefficients, small-scale fading channel statistics, noise variance at the access points, and user activity probability. Acquiring these information would take a significant overhead and their estimated values might not be accurate. This problem is even more severe in cell-free networks as there are many of these parameters to be acquired. Therefore, this paper sets out to investigate the activity detection problem without the above-mentioned information. In order to handle so many unknown parameters, this paper employs the Bayesian approach, where the unknown variables are endowed with prior distributions which effectively act as regularizations. Together with the likelihood function, a maximum a posteriori (MAP) estimator and a variational inference algorithm are derived. Extensive simulations demonstrate that the proposed methods, even without the knowledge of these system parameters, perform better than existing state-of-the-art methods, such as covariance-based and approximate message passing methods.
( 2
min )
Deep Neural Network (DNN) models when implemented on executing devices as the inference engines are susceptible to Fault Injection Attacks (FIAs) that manipulate model parameters to disrupt inference execution with disastrous performance. This work introduces Contrastive Learning (CL) of visual representations i.e., a self-supervised learning approach into the deep learning training and inference pipeline to implement DNN inference engines with self-resilience under FIAs. Our proposed CL based FIA Detection and Recovery (CFDR) framework features (i) real-time detection with only a single batch of testing data and (ii) fast recovery effective even with only a small amount of unlabeled testing data. Evaluated with the CIFAR-10 dataset on multiple types of FIAs, our CFDR shows promising detection and recovery effectiveness.
( 2
min )
This paper introduces the multivariate beta mixture model (MBMM), a new probabilistic model for soft clustering. MBMM adapts to diverse cluster shapes because of the flexible probability density function of the multivariate beta distribution. We introduce the properties of MBMM, describe the parameter learning procedure, and present the experimental results, showing that MBMM fits diverse cluster shapes on synthetic and real datasets. The code is released anonymously at \url{https://github.com/hhchen1105/mbmm/}.
( 2
min )
This work undertakes studies to evaluate Interpretability Methods for Time-Series Deep Learning. Sensitivity analysis assesses how input changes affect the output, constituting a key component of interpretation. Among the post-hoc interpretation methods such as back-propagation, perturbation, and approximation, my work will investigate perturbation-based sensitivity Analysis methods on modern Transformer models to benchmark their performances. Specifically, my work answers three research questions: 1) Do different sensitivity analysis (SA) methods yield comparable outputs and attribute importance rankings? 2) Using the same sensitivity analysis method, do different Deep Learning (DL) models impact the output of the sensitivity analysis? 3) How well do the results from sensitivity analysis methods align with the ground truth?
( 2
min )
This paper presents a modeling effort to explore the underlying physics of temperature evolution during additive friction stir deposition (AFSD) by a human-AI teaming approach. AFSD is an emerging solid-state additive manufacturing technology that deposits materials without melting. However, both process modeling and modeling of the AFSD tool are at an early stage. In this paper, a human-AI teaming approach is proposed to combine models based on first principles with AI. The resulting human-informed machine learning method, denoted as AFSD-Physics, can effectively learn the governing equations of temperature evolution at the tool and the build from in-process measurements. Experiments are designed and conducted to collect in-process measurements for the deposition of aluminum 7075 with a total of 30 layers. The acquired governing equations are physically interpretable models with low computational cost and high accuracy. Model predictions show good agreement with the measurements. Experimental validation with new process parameters demonstrates the model's generalizability and potential for use in tool temperature control and process optimization.
( 2
min )
This paper studies Bayesian optimization with noise-free observations. We introduce new algorithms rooted in scattered data approximation that rely on a random exploration step to ensure that the fill-distance of query points decays at a near-optimal rate. Our algorithms retain the ease of implementation of the classical GP-UCB algorithm and satisfy cumulative regret bounds that nearly match those conjectured in arXiv:2002.05096, hence solving a COLT open problem. Furthermore, the new algorithms outperform GP-UCB and other popular Bayesian optimization strategies in several examples.
( 2
min )
A common forecasting setting in real world applications considers a set of possibly heterogeneous time series of the same domain. Due to different properties of each time series such as length, obtaining forecasts for each individual time series in a straight-forward way is challenging. This paper proposes a general framework utilizing a similarity measure in Dynamic Time Warping to find similar time series to build neighborhoods in a k-Nearest Neighbor fashion, and improve forecasts of possibly simple models by averaging. Several ways of performing the averaging are suggested, and theoretical arguments underline the usefulness of averaging for forecasting. Additionally, diagnostics tools are proposed allowing a deep understanding of the procedure.
( 2
min )
Microsoft Research Forum (opens in new tab) is a new series of conversations that explore recent advances, bold new ideas, and important discussions within the global research community. Leading Microsoft researchers will share insights into their work, followed by live online discussions with audience participants. This post provides an overview of the inaugural Microsoft Research […]
The post Microsoft Research Forum: New series explores bold ideas in technology research in the era of AI appeared first on Microsoft Research.
( 11
min )
In this post, we show you how to securely create a movie chatbot by implementing RAG with your own data using Knowledge Bases for Amazon Bedrock. We use the IMDb and Box Office Mojo dataset to simulate a catalog for media and entertainment customers and showcase how you can build your own RAG solution in just a couple of steps.
( 7
min )
This post was co-written with Ricardo Perdigao, Solution Architecture Manager at Mendix, a Siemens business. Mendix, a Siemens business, offers the low-code platform with the vision and execution designed for today’s complex software development challenges. Since 2005, we’ve helped thousands of organizations worldwide reimagine how they develop applications with our platform’s cutting-edge capabilities. Mendix allows […]
( 8
min )
In the first part of this three-part series, we presented a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case. In this post, we present an approach to develop a deep learning-based computer vision model to […]
( 13
min )
Data governance is more important than ever in e-commerce, where massive amounts of data are generated and processed daily. Big Data presents opportunities and challenges for e-commerce businesses, requiring a strategic approach to data quality, security, and compliance. This article discusses e-commerce data governance best practices, including understanding data governance, data quality, data security, compliance… Read More »Mastering E-commerce data governance: Best practices, challenges, and future trends for quality, compliance, and growth
The post Mastering E-commerce data governance: Best practices, challenges, and future trends for quality, compliance, and growth appeared first on Data Science Central.
( 27
min )
Here’s some news to still beating hearts: AI is helping bring some clarity to cardiology. Caristo Diagnostics has developed an AI-powered solution for detecting coronary inflammation in cardiac CT scans. In this episode of NVIDIA’s AI Podcast, Dr. Keith Channon, the Field Marshal Earl Alexander Professor at the University of Oxford, and the cofounder and Read article >
( 5
min )
Asia’s lion city is roaring ahead in AI. Singtel, a leading communications services provider based in Singapore, will bring the NVIDIA AI platform to businesses in the island nation and beyond. The mobile and broadband company is building energy-efficient data centers across Southeast Asia accelerated with NVIDIA Hopper architecture GPUs and using NVIDIA AI reference Read article >
( 6
min )
We’re developing a blueprint for evaluating the risk that a large language model (LLM) could aid someone in creating a biological threat. In an evaluation involving both biology experts and students, we found that GPT-4 provides at most a mild uplift in biological threat creation accuracy. While this uplift is not large enough to be conclusive, our finding is a starting point for continued research and community deliberation.
( 20
min )
Recent advancements in biological research leverage the integration of
molecules, proteins, and natural language to enhance drug discovery. However,
current models exhibit several limitations, such as the generation of invalid
molecular SMILES, underutilization of contextual information, and equal
treatment of structured and unstructured knowledge. To address these issues, we
propose $\mathbf{BioT5}$, a comprehensive pre-training framework that enriches
cross-modal integration in biology with chemical knowledge and natural language
associations. $\mathbf{BioT5}$ utilizes SELFIES for $100%$ robust molecular
representations and extracts knowledge from the surrounding context of
bio-entities in unstructured biological literature. Furthermore,
$\mathbf{BioT5}$ distinguishes between structured and unstructured knowledge,
leading to more effective utilization of information. After fine-tuning, BioT5
shows superior performance across a wide range of tasks, demonstrating its
strong capability of capturing underlying relations and properties of
bio-entities. Our code is available at
$\href{https://github.com/QizhiPei/BioT5}{Github}$.
( 2
min )
Rapid advancements in artificial intelligence (AI) technology have brought
about a plethora of new challenges in terms of governance and regulation. AI
systems are being integrated into various industries and sectors, creating a
demand from decision-makers to possess a comprehensive and nuanced
understanding of the capabilities and limitations of these systems. One
critical aspect of this demand is the ability to explain the results of machine
learning models, which is crucial to promoting transparency and trust in AI
systems, as well as fundamental in helping machine learning models to be
trained ethically. In this paper, we present novel metrics to quantify the
degree of which AI model predictions can be easily explainable by its features.
Our metrics summarize different aspects of explainability into scalars,
providing a more comprehensive understanding of model predictions and
facilitating communication between decision-makers and stakeholders, thereby
increasing the overall transparency and accountability of AI systems.
( 2
min )
Besides training, mathematical optimization is also used in deep learning to
model and solve formulations over trained neural networks for purposes such as
verification, compression, and optimization with learned constraints. However,
solving these formulations soon becomes difficult as the network size grows due
to the weak linear relaxation and dense constraint matrix. We have seen
improvements in recent years with cutting plane algorithms, reformulations, and
an heuristic based on Mixed-Integer Linear Programming (MILP). In this work, we
propose a more scalable heuristic based on exploring global and local linear
relaxations of the neural network model. Our heuristic is competitive with a
state-of-the-art MILP solver and the prior heuristic while producing better
solutions with increases in input, depth, and number of neurons.
( 2
min )
Recently, Deep Convolutional Neural Networks (DCNNs) including the ResNet-20
architecture have been privately evaluated on encrypted, low-resolution data
with the Residue-Number-System Cheon-Kim-Kim-Song (RNS-CKKS) homomorphic
encryption scheme. We extend methods for evaluating DCNNs on images with larger
dimensions and many channels, beyond what can be stored in single ciphertexts.
Additionally, we simplify and improve the efficiency of the recently introduced
multiplexed image format, demonstrating that homomorphic evaluation can work
with standard, row-major matrix packing and results in encrypted inference time
speedups by $4.6-6.5\times$. We also show how existing DCNN models can be
regularized during the training process to further improve efficiency and
accuracy. These techniques are applied to homomorphically evaluate a DCNN with
high accuracy on the high-resolution ImageNet dataset, achieving $80.2\%$ top-1
accuracy. We also achieve an accuracy of homomorphically evaluated CNNs on the
CIFAR-10 dataset of $98.3\%$.
( 2
min )
In this paper we prove Gamma-convergence of a nonlocal perimeter of Minkowski
type to a local anisotropic perimeter. The nonlocal model describes the
regularizing effect of adversarial training in binary classifications. The
energy essentially depends on the interaction between two distributions
modelling likelihoods for the associated classes. We overcome typical strict
regularity assumptions for the distributions by only assuming that they have
bounded $BV$ densities. In the natural topology coming from compactness, we
prove Gamma-convergence to a weighted perimeter with weight determined by an
anisotropic function of the two densities. Despite being local, this sharp
interface limit reflects classification stability with respect to adversarial
perturbations. We further apply our results to deduce Gamma-convergence of the
associated total variations, to study the asymptotics of adversarial training,
and to prove Gamma-convergence of graph discretizations for the nonlocal
perimeter.
( 2
min )
We introduce NeuroSynt, a neuro-symbolic portfolio solver framework for
reactive synthesis. At the core of the solver lies a seamless integration of
neural and symbolic approaches to solving the reactive synthesis problem. To
ensure soundness, the neural engine is coupled with model checkers verifying
the predictions of the underlying neural models. The open-source implementation
of NeuroSynt provides an integration framework for reactive synthesis in which
new neural and state-of-the-art symbolic approaches can be seamlessly
integrated. Extensive experiments demonstrate its efficacy in handling
challenging specifications, enhancing the state-of-the-art reactive synthesis
solvers, with NeuroSynt contributing novel solves in the current SYNTCOMP
benchmarks.
( 2
min )
Federated Learning (FL) is a machine learning approach that addresses privacy
and data transfer costs by computing data at the source. It's particularly
popular for Edge and IoT applications where the aggregator server of FL is in
resource-capped edge data centers for reducing communication costs. Existing
cloud-based aggregator solutions are resource-inefficient and expensive at the
Edge, leading to low scalability and high latency. To address these challenges,
this study compares prior and new aggregation methodologies under the changing
demands of IoT and Edge applications. This work is the first to propose an
adaptive FL aggregator at the Edge, enabling users to manage the cost and
efficiency trade-off. An extensive comparative analysis demonstrates that the
design improves scalability by up to 4X, time efficiency by 8X, and reduces
costs by more than 2X compared to extant cloud-based static methodologies.
( 2
min )
We introduce the higher-order refactoring problem, where the goal is to
compress a logic program by discovering higher-order abstractions, such as map,
filter, and fold. We implement our approach in Stevie, which formulates the
refactoring problem as a constraint optimisation problem. Our experiments on
multiple domains, including program synthesis and visual reasoning, show that
refactoring can improve the learning performance of an inductive logic
programming system, specifically improving predictive accuracies by 27% and
reducing learning times by 47%. We also show that Stevie can discover
abstractions that transfer to multiple domains.
( 2
min )
Large language models (LLMs) such as GPT-3.5 and CodeLlama are powerful
models for code generation and understanding. Fine-tuning these models comes
with a high computational cost and requires a large labeled dataset.
Alternatively, in-context learning techniques allow models to learn downstream
tasks with only a few examples. Recently, researchers have shown how in-context
learning performs well in bug detection and repair. In this paper, we propose
code-pair classification task in which both the buggy and non-buggy versions
are given to the model, and the model identifies the buggy ones. We evaluate
our task in real-world dataset of bug detection and two most powerful LLMs. Our
experiments indicate that an LLM can often pick the buggy from the non-buggy
version of the code, and the code-pair classification task is much easier
compared to be given a snippet and deciding if and where a bug exists.
( 2
min )
Premise selection is a fundamental problem of automated theorem proving.
Previous works often use intricate symbolic methods, rely on domain knowledge,
and require significant engineering effort to solve this task. In this work, we
show that Magnushammer, a neural transformer-based approach, can outperform
traditional symbolic systems by a large margin. Tested on the PISA benchmark,
Magnushammer achieves $59.5\%$ proof rate compared to a $38.3\%$ proof rate of
Sledgehammer, the most mature and popular symbolic-based solver. Furthermore,
by combining Magnushammer with a neural formal prover based on a language
model, we significantly improve the previous state-of-the-art proof rate from
$57.0\%$ to $71.0\%$.
( 2
min )
Graph Neural Networks are notorious for its memory consumption. A recent
Transformer-based GNN called Graph Transformer is shown to obtain superior
performances when long range dependencies exist. However, combining graph data
and Transformer architecture led to a combinationally worse memory issue. We
propose a novel version of "edge regularization technique" that alleviates the
need for Positional Encoding and ultimately alleviate GT's out of memory issue.
We observe that it is not clear whether having an edge regularization on top of
positional encoding is helpful. However, it seems evident that applying our
edge regularization technique indeed stably improves GT's performance compared
to GT without Positional Encoding.
( 2
min )
Low-rank adaptation (LoRA) has emerged as a new paradigm for cost-efficient
fine-tuning of large language models (LLMs). However, fine-tuned LLMs often
become overconfident especially when fine-tuned on small datasets. Bayesian
methods, with their inherent ability to estimate uncertainty, serve as potent
tools to mitigate overconfidence and enhance calibration. In this work, we
introduce Laplace-LoRA, which applies a Bayesian approach to the LoRA
parameters. Specifically, Laplace-LoRA applies a Laplace approximation to the
posterior over the LoRA parameters, considerably improving the calibration of
fine-tuned LLMs.
( 2
min )
Long-term fetal heart rate (FHR) monitoring during the antepartum period,
increasingly popularized by electronic FHR monitoring, represents a growing
approach in FHR monitoring. This kind of continuous monitoring, in contrast to
the short-term one, collects an extended period of fetal heart data. This
offers a more comprehensive understanding of fetus's conditions. However, the
interpretation of long-term antenatal fetal heart monitoring is still in its
early stages, lacking corresponding clinical standards. Furthermore, the
substantial amount of data generated by continuous monitoring imposes a
significant burden on clinical work when analyzed manually. To address above
challenges, this study develops an automatic analysis system named LARA
(Long-term Antepartum Risk Analysis system) for continuous FHR monitoring,
combining deep learning and information fusion methods. LARA's core is a
well-established convolutional neural network (CNN) model. It processes
long-term FHR data as input and generates a Risk Distribution Map (RDM) and
Risk Index (RI) as the analysis results. We evaluate LARA on inner test
dataset, the performance metrics are as follows: AUC 0.872, accuracy 0.816,
specificity 0.811, sensitivity 0.806, precision 0.271, and F1 score 0.415. In
our study, we observe that long-term FHR monitoring data with higher RI is more
likely to result in adverse outcomes (p=0.0021). In conclusion, this study
introduces LARA, the first automated analysis system for long-term FHR
monitoring, initiating the further explorations into its clinical value in the
future.
( 3
min )
We study the problem of in-context learning (ICL) with large language models
(LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak
or regurgitate the private examples demonstrated in the prompt. We propose a
novel algorithm that generates synthetic few-shot demonstrations from the
private dataset with formal differential privacy (DP) guarantees, and show
empirically that it can achieve effective ICL. We conduct extensive experiments
on standard benchmarks and compare our algorithm with non-private ICL and
zero-shot solutions. Our results demonstrate that our algorithm can achieve
competitive performance with strong privacy levels. These results open up new
possibilities for ICL with privacy protection for a broad range of
applications.
( 2
min )
In the rapidly evolving field of machine learning, adversarial attacks
present a significant challenge to model robustness and security.
Decision-based attacks, which only require feedback on the decision of a model
rather than detailed probabilities or scores, are particularly insidious and
difficult to defend against. This work introduces L-AutoDA (Large Language
Model-based Automated Decision-based Adversarial Attacks), a novel approach
leveraging the generative capabilities of Large Language Models (LLMs) to
automate the design of these attacks. By iteratively interacting with LLMs in
an evolutionary framework, L-AutoDA automatically designs competitive attack
algorithms efficiently without much human effort. We demonstrate the efficacy
of L-AutoDA on CIFAR-10 dataset, showing significant improvements over baseline
methods in both success rate and computational efficiency. Our findings
underscore the potential of language models as tools for adversarial attack
generation and highlight new avenues for the development of robust AI systems.
( 2
min )
This paper reports on the design and results of the 2024 ICASSP SP Cadenza
Challenge: Music Demixing/Remixing for Hearing Aids. The Cadenza project is
working to enhance the audio quality of music for those with a hearing loss.
The scenario for the challenge was listening to stereo reproduction over
loudspeakers via hearing aids. The task was to: decompose pop/rock music into
vocal, drums, bass and other (VDBO); rebalance the different tracks with
specified gains and then remixing back to stereo. End-to-end approaches were
also accepted. 17 systems were submitted by 11 teams. Causal systems performed
poorer than non-causal approaches. 9 systems beat the baseline. A common
approach was to fine-tuning pretrained demixing models. The best approach used
an ensemble of models.
( 2
min )
We study a regularized interacting particle method for computing aggregation
patterns and near singular solutions of a Keller-Segal (KS) chemotaxis system
in two and three space dimensions, then further develop DeepParticle (DP)
method to learn and generate solutions under variations of physical parameters.
The KS solutions are approximated as empirical measures of particles which
self-adapt to the high gradient part of solutions. We utilize the
expressiveness of deep neural networks (DNNs) to represent the transform of
samples from a given initial (source) distribution to a target distribution at
finite time T prior to blowup without assuming invertibility of the transforms.
In the training stage, we update the network weights by minimizing a discrete
2-Wasserstein distance between the input and target empirical measures. To
reduce computational cost, we develop an iterative divide-and-conquer algorithm
to find the optimal transition matrix in the Wasserstein distance. We present
numerical results of DP framework for successful learning and generation of KS
dynamics in the presence of laminar and chaotic flows. The physical parameter
in this work is either the small diffusivity of chemo-attractant or the
reciprocal of the flow amplitude in the advection-dominated regime.
( 2
min )
Within cardiovascular disease detection using deep learning applied to ECG
signals, the complexities of handling physiological signals have sparked
growing interest in leveraging deep generative models for effective data
augmentation. In this paper, we introduce a novel versatile approach based on
denoising diffusion probabilistic models for ECG synthesis, addressing three
scenarios: (i) heartbeat generation, (ii) partial signal imputation, and (iii)
full heartbeat forecasting. Our approach presents the first generalized
conditional approach for ECG synthesis, and our experimental results
demonstrate its effectiveness for various ECG-related tasks. Moreover, we show
that our approach outperforms other state-of-the-art ECG generative models and
can enhance the performance of state-of-the-art classifiers.
( 2
min )
In recent years, there has been a noticeable increase in cyberattacks using
ransomware. Attackers use this malicious software to break into networks and
harm computer systems. This has caused significant and lasting damage to
various organizations, including government, private companies, and regular
users. These attacks often lead to the loss or exposure of sensitive
information, disruptions in normal operations, and persistent vulnerabilities.
This paper focuses on a method for recognizing and identifying ransomware in
computer networks. The approach relies on using machine learning algorithms and
analyzing the patterns of network traffic. By collecting and studying this
traffic, and then applying machine learning models, we can accurately identify
and detect ransomware. The results of implementing this method show that
machine learning algorithms can effectively pinpoint ransomware based on
network traffic, achieving high levels of precision and accuracy.
( 2
min )
In-context learning (ICL) suffers from oversensitivity to the prompt, making
it unreliable in real-world scenarios. We study the sensitivity of ICL with
respect to multiple perturbation types. First, we find that label bias obscures
the true sensitivity, and therefore prior work may have significantly
underestimated ICL sensitivity. Second, we observe a strong negative
correlation between ICL sensitivity and accuracy: predictions sensitive to
perturbations are less likely to be correct. Motivated by these findings, we
propose \textsc{SenSel}, a few-shot selective prediction method that abstains
from sensitive predictions. Experiments on ten classification datasets show
that \textsc{SenSel} consistently outperforms two commonly used
confidence-based and entropy-based baselines on abstention decisions.
( 2
min )
Existing analyses of the expressive capacity of Transformer models have
required excessively deep layers for data memorization, leading to a
discrepancy with the Transformers actually used in practice. This is primarily
due to the interpretation of the softmax function as an approximation of the
hardmax function. By clarifying the connection between the softmax function and
the Boltzmann operator, we prove that a single layer of self-attention with
low-rank weight matrices possesses the capability to perfectly capture the
context of an entire input sequence. As a consequence, we show that one-layer
and single-head Transformers have a memorization capacity for finite samples,
and that Transformers consisting of one self-attention layer with two
feed-forward neural networks are universal approximators for continuous
permutation equivariant functions on a compact domain.
( 2
min )
We propose a convex signal reconstruction method for block sparsity under
arbitrary linear transform with unknown block structure. The proposed method is
a generalization of the existing method LOP-$\ell_2$/$\ell_1$ and can
reconstruct signals with block sparsity under non-invertible transforms, unlike
LOP-$\ell_2$/$\ell_1$. Our work broadens the scope of block sparse
regularization, enabling more versatile and powerful applications across
various signal processing domains. We derive an iterative algorithm for solving
proposed method and provide conditions for its convergence to the optimal
solution. Numerical experiments demonstrate the effectiveness of the proposed
method.
( 2
min )
Electronic health record (EHR) is more and more popular, and it comes with
applying machine learning solutions to resolve various problems in the domain.
This growing research area also raises the need for EHRs accessibility. Medical
Information Mart for Intensive Care (MIMIC) dataset is a popular, public, and
free EHR dataset in a raw format that has been used in numerous studies.
However, despite of its popularity, it is lacking benchmarking work, especially
with recent state of the art works in the field of deep learning with
time-series tabular data. The aim of this work is to fill this lack by
providing a benchmark for latest version of MIMIC dataset, MIMIC-IV. We also
give a detailed literature survey about studies that has been already done for
MIIMIC-III.
( 2
min )
Federated Learning (FL) enables collaborative model training among medical
centers without sharing private data. However, traditional FL risks on server
failures and suboptimal performance on local data due to the nature of
centralized model aggregation. To address these issues, we present Gossip
Mutual Learning (GML), a decentralized framework that uses Gossip Protocol for
direct peer-to-peer communication. In addition, GML encourages each site to
optimize its local model through mutual learning to account for data variations
among different sites. For the task of tumor segmentation using 146 cases from
four clinical sites in BraTS 2021 dataset, we demonstrated GML outperformed
local models and achieved similar performance as FedAvg with only 25%
communication overhead.
( 2
min )
The emergence of novel the dummy data injection attack (DDIA) poses a severe
threat to the secure and stable operation of power systems. These attacks are
particularly perilous due to the minimal Euclidean spatial separation between
the injected malicious data and legitimate data, rendering their precise
detection challenging using conventional distance-based methods. Furthermore,
existing research predominantly focuses on various machine learning techniques,
often analyzing the temporal data sequences post-attack or relying solely on
Euclidean spatial characteristics. Unfortunately, this approach tends to
overlook the inherent topological correlations within the non-Euclidean spatial
attributes of power grid data, consequently leading to diminished accuracy in
attack localization. To address this issue, this study takes a comprehensive
approach. Initially, it examines the underlying principles of these new DDIAs
on power systems. Here, an intricate mathematical model of the DDIA is
designed, accounting for incomplete topological knowledge and alternating
current (AC) state estimation from an attacker's perspective. Subsequently, by
integrating a priori knowledge of grid topology and considering the temporal
correlations within measurement data and the topology-dependent attributes of
the power grid, this study introduces temporal and spatial attention matrices.
These matrices adaptively capture the spatio-temporal correlations within the
attacks. Leveraging gated stacked causal convolution and graph wavelet sparse
convolution, the study jointly extracts spatio-temporal DDIA features. Finally,
the research proposes a DDIA localization method based on spatio-temporal graph
neural networks. The accuracy and effectiveness of the DDIA model are
rigorously demonstrated through comprehensive analytical cases.
( 3
min )
Object detection in reduced visibility has become a prominent research area.
The existing techniques are not accurate enough in recognizing objects under
such circumstances. This paper introduces a new foggy object detection method
through a two-staged architecture of region identification from input images
and detecting objects in such regions. The paper confirms notable improvements
of the proposed method's accuracy and detection time over existing techniques.
( 2
min )
Experiments at the High-Luminosity LHC and the Future Circular Collider need
efficient algorithms to reconstruct granular events expected at such detectors
with high fidelity. We study scalable machine learning models for event
reconstruction in electron-positron collisions based on a full detector
simulation. Particle-flow reconstruction can be formulated as a supervised
learning task using tracks and calorimeter clusters. We compare a graph neural
network and kernel-based transformer and demonstrate that we can avoid
quadratic operations while achieving realistic reconstruction. We show that
hyperparameter tuning significantly improves the performance of the models. The
best graph neural network model shows improvement in the jet transverse
momentum resolution by up to 50% compared to the rule-based algorithm. Accurate
reconstruction can significantly improve future measurements at colliders. The
resulting model is portable across Nvidia, AMD and Habana hardware. Our
datasets and software are published following the findable, accessible,
interoperable, and reusable principles.
( 3
min )
Multi-distribution learning generalizes the classic PAC learning to handle
data coming from multiple distributions. Given a set of $k$ data distributions
and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis
that minimizes the maximum population loss over $k$ distributions, up to
$\epsilon$ additive error. In this paper, we settle the sample complexity of
multi-distribution learning by giving an algorithm of sample complexity
$\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the
lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem
of Awasthi, Haghtalab and Zhao [AHZ23].
( 2
min )
This paper studies the theoretical framework of the alignment process of
generative models with Reinforcement Learning from Human Feedback (RLHF). We
consider a standard mathematical formulation, the reverse-KL regularized
contextual bandit for RLHF. Despite its widespread practical application, a
rigorous theoretical analysis of this formulation remains open. We investigate
its behavior in three distinct settings -- offline, online, and hybrid -- and
propose efficient algorithms with finite-sample theoretical guarantees.
Moving towards practical applications, our framework, with a robust
approximation of the information-theoretical policy improvement oracle,
naturally gives rise to several novel RLHF algorithms. This includes an
iterative version of the Direct Preference Optimization (DPO) algorithm for
online settings, and a multi-step rejection sampling strategy for offline
scenarios. Our empirical evaluations on real-world alignment experiment of
large language model demonstrate that these proposed methods significantly
surpass existing strong baselines, such as DPO and Rejection Sampling
Optimization (RSO), showcasing the connections between solid theoretical
foundations and their powerful practical implementations.
( 2
min )
Missing data is a common problem in practical settings. Various imputation
methods have been developed to deal with missing data. However, even though the
label is usually available in the training data, the common practice of
imputation usually only relies on the input and ignores the label. In this
work, we illustrate how stacking the label into the input can significantly
improve the imputation of the input. In addition, we propose a classification
strategy that initializes the predicted test label with missing values and
stacks the label with the input for imputation. This allows imputing the label
and the input at the same time. Also, the technique is capable of handling data
training with missing labels without any prior imputation and is applicable to
continuous, categorical, or mixed-type data. Experiments show promising results
in terms of accuracy.
( 2
min )
In this work, we study a natural nonparametric estimator of the transition
probability matrices of a finite controlled Markov chain. We consider an
offline setting with a fixed dataset, collected using a so-called logging
policy. We develop sample complexity bounds for the estimator and establish
conditions for minimaxity. Our statistical bounds depend on the logging policy
through its mixing properties. We show that achieving a particular statistical
risk bound involves a subtle and interesting trade-off between the strength of
the mixing properties and the number of samples. We demonstrate the validity of
our results under various examples, such as ergodic Markov chains, weakly
ergodic inhomogeneous Markov chains, and controlled Markov chains with
non-stationary Markov, episodic, and greedy controls. Lastly, we use these
sample complexity bounds to establish concomitant ones for offline evaluation
of stationary Markov control policies.
( 2
min )
Transfer learning plays a key role in modern data analysis when: (1) the
target data are scarce but the source data are sufficient; (2) the
distributions of the source and target data are heterogeneous. This paper
develops an interpretable unified transfer learning model, termed as UTrans,
which can detect both transferable variables and source data. More
specifically, we establish the estimation error bounds and prove that our
bounds are lower than those with target data only. Besides, we propose a source
detection algorithm based on hypothesis testing to exclude the nontransferable
data. We evaluate and compare UTrans to the existing algorithms in multiple
experiments. It is shown that UTrans attains much lower estimation and
prediction errors than the existing methods, while preserving interpretability.
We finally apply it to the US intergenerational mobility data and compare our
proposed algorithms to the classical machine learning algorithms.
( 2
min )
We study policy optimization algorithms for computing correlated equilibria
in multi-player general-sum Markov Games. Previous results achieve
$O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated
$O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated
equilibrium. In this paper, we improve both results significantly by providing
an uncoupled policy optimization algorithm that attains a near-optimal
$\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium.
Our algorithm is constructed by combining two main elements (i) smooth value
updates and (ii) the optimistic-follow-the-regularized-leader algorithm with
the log barrier regularizer.
( 2
min )
Interpreting deep learning time series models is crucial in understanding the
model's behavior and learning patterns from raw data for real-time
decision-making. However, the complexity inherent in transformer-based time
series models poses challenges in explaining the impact of individual features
on predictions. In this study, we leverage recent local interpretation methods
to interpret state-of-the-art time series models. To use real-world datasets,
we collected three years of daily case data for 3,142 US counties. Firstly, we
compare six transformer-based models and choose the best prediction model for
COVID-19 infection. Using 13 input features from the last two weeks, we can
predict the cases for the next two weeks. Secondly, we present an innovative
way to evaluate the prediction sensitivity to 8 population age groups over
highly dynamic multivariate infection data. Thirdly, we compare our proposed
perturbation-based interpretation method with related work, including a total
of eight local interpretation methods. Finally, we apply our framework to
traffic and electricity datasets, demonstrating that our approach is generic
and can be applied to other time-series domains.
( 3
min )
This paper presents a plugin that adds a representation of homogeneous and
heterogeneous, optically thick, translucent materials on the Blender 3D
modeling tool. The working principle of this plugin is based on a combination
of Genetic Algorithm (GA) and Singular Value Decomposition (SVD)-based
subsurface scattering method (GenSSS). The proposed plugin has been implemented
using Mitsuba renderer, which is an open source rendering software. The
proposed plugin has been validated on measured subsurface scattering data. It's
shown that the proposed plugin visualizes homogeneous and heterogeneous
subsurface scattering effects, accurately, compactly and computationally
efficiently.
( 2
min )
This paper presents a description of a real-world, multivariate time series
dataset collected from an anonymized engine component (called Component X) of a
fleet of trucks from SCANIA, Sweden. This dataset includes diverse variables
capturing detailed operational data, repair records, and specifications of
trucks while maintaining confidentiality by anonymization. It is well-suited
for a range of machine learning applications, such as classification,
regression, survival analysis, and anomaly detection, particularly when applied
to predictive maintenance scenarios. The large population size and variety of
features in the format of histograms and numerical counters, along with the
inclusion of temporal information, make this real-world dataset unique in the
field. The objective of releasing this dataset is to give a broad range of
researchers the possibility of working with real-world data from an
internationally well-known company and introduce a standard benchmark to the
predictive maintenance field, fostering reproducible research.
( 2
min )
Background: The semantics of entities extracted from a clinical text can be
dramatically altered by modifiers, including entity negation, uncertainty,
conditionality, severity, and subject. Existing models for determining
modifiers of clinical entities involve regular expression or features weights
that are trained independently for each modifier.
Methods: We develop and evaluate a multi-task transformer architecture design
where modifiers are learned and predicted jointly using the publicly available
SemEval 2015 Task 14 corpus and a new Opioid Use Disorder (OUD) data set that
contains modifiers shared with SemEval as well as novel modifiers specific for
OUD. We evaluate the effectiveness of our multi-task learning approach versus
previously published systems and assess the feasibility of transfer learning
for clinical entity modifiers when only a portion of clinical modifiers are
shared.
Results: Our approach achieved state-of-the-art results on the ShARe corpus
from SemEval 2015 Task 14, showing an increase of 1.1% on weighted accuracy,
1.7% on unweighted accuracy, and 10% on micro F1 scores.
Conclusions: We show that learned weights from our shared model can be
effectively transferred to a new partially matched data set, validating the use
of transfer learning for clinical text modifiers
( 3
min )
Pre-training is known to generate universal representations for downstream
tasks in large-scale deep learning such as large language models. Existing
literature, e.g., \cite{kim2020adversarial}, empirically observe that the
downstream tasks can inherit the adversarial robustness of the pre-trained
model. We provide theoretical justifications for this robustness inheritance
phenomenon. Our theoretical results reveal that feature purification plays an
important role in connecting the adversarial robustness of the pre-trained
model and the downstream tasks in two-layer neural networks. Specifically, we
show that (i) with adversarial training, each hidden node tends to pick only
one (or a few) feature; (ii) without adversarial training, the hidden nodes can
be vulnerable to attacks. This observation is valid for both supervised
pre-training and contrastive learning. With purified nodes, it turns out that
clean training is enough to achieve adversarial robustness in downstream tasks.
( 2
min )
We explore a novel methodology for constructing confidence regions for
parameters of linear models, using predictions from any arbitrary predictor.
Our framework requires minimal assumptions on the noise and can be extended to
functions deviating from strict linearity up to some adjustable threshold,
thereby accommodating a comprehensive and pragmatically relevant set of
functions. The derived confidence regions can be cast as constraints within a
Mixed Integer Linear Programming framework, enabling optimisation of linear
objectives. This representation enables robust optimization and the extraction
of confidence intervals for specific parameter coordinates. Unlike previous
methods, the confidence region can be empty, which can be used for hypothesis
testing. Finally, we validate the empirical applicability of our method on
synthetic data.
( 2
min )
This paper studies the estimation and inference of treatment histories in
panel data settings when treatments change dynamically over time.
We propose a method that allows for (i) treatments to be assigned dynamically
over time based on high-dimensional covariates, past outcomes and treatments;
(ii) outcomes and time-varying covariates to depend on treatment trajectories;
(iii) heterogeneity of treatment effects.
Our approach recursively projects potential outcomes' expectations on past
histories. It then controls the bias by balancing dynamically observable
characteristics. We study the asymptotic and numerical properties of the
estimator and illustrate the benefits of the procedure in an empirical
application.
( 2
min )
In this work, we study a natural nonparametric estimator of the transition
probability matrices of a finite controlled Markov chain. We consider an
offline setting with a fixed dataset, collected using a so-called logging
policy. We develop sample complexity bounds for the estimator and establish
conditions for minimaxity. Our statistical bounds depend on the logging policy
through its mixing properties. We show that achieving a particular statistical
risk bound involves a subtle and interesting trade-off between the strength of
the mixing properties and the number of samples. We demonstrate the validity of
our results under various examples, such as ergodic Markov chains, weakly
ergodic inhomogeneous Markov chains, and controlled Markov chains with
non-stationary Markov, episodic, and greedy controls. Lastly, we use these
sample complexity bounds to establish concomitant ones for offline evaluation
of stationary Markov control policies.
( 2
min )
Transfer learning plays a key role in modern data analysis when: (1) the
target data are scarce but the source data are sufficient; (2) the
distributions of the source and target data are heterogeneous. This paper
develops an interpretable unified transfer learning model, termed as UTrans,
which can detect both transferable variables and source data. More
specifically, we establish the estimation error bounds and prove that our
bounds are lower than those with target data only. Besides, we propose a source
detection algorithm based on hypothesis testing to exclude the nontransferable
data. We evaluate and compare UTrans to the existing algorithms in multiple
experiments. It is shown that UTrans attains much lower estimation and
prediction errors than the existing methods, while preserving interpretability.
We finally apply it to the US intergenerational mobility data and compare our
proposed algorithms to the classical machine learning algorithms.
( 2
min )
Missing data is a common problem in practical settings. Various imputation
methods have been developed to deal with missing data. However, even though the
label is usually available in the training data, the common practice of
imputation usually only relies on the input and ignores the label. In this
work, we illustrate how stacking the label into the input can significantly
improve the imputation of the input. In addition, we propose a classification
strategy that initializes the predicted test label with missing values and
stacks the label with the input for imputation. This allows imputing the label
and the input at the same time. Also, the technique is capable of handling data
training with missing labels without any prior imputation and is applicable to
continuous, categorical, or mixed-type data. Experiments show promising results
in terms of accuracy.
( 2
min )
Experiments at the High-Luminosity LHC and the Future Circular Collider need
efficient algorithms to reconstruct granular events expected at such detectors
with high fidelity. We study scalable machine learning models for event
reconstruction in electron-positron collisions based on a full detector
simulation. Particle-flow reconstruction can be formulated as a supervised
learning task using tracks and calorimeter clusters. We compare a graph neural
network and kernel-based transformer and demonstrate that we can avoid
quadratic operations while achieving realistic reconstruction. We show that
hyperparameter tuning significantly improves the performance of the models. The
best graph neural network model shows improvement in the jet transverse
momentum resolution by up to 50% compared to the rule-based algorithm. Accurate
reconstruction can significantly improve future measurements at colliders. The
resulting model is portable across Nvidia, AMD and Habana hardware. Our
datasets and software are published following the findable, accessible,
interoperable, and reusable principles.
( 3
min )
This paper proposes a model learning Semi-parametric rela- tionships in an
Expert Bayesian Network (SEBN) with linear parameter and structure constraints.
We use Gaussian Pro- cesses and a Horseshoe prior to introduce minimal nonlin-
ear components. To prioritize modifying the expert graph over adding new edges,
we optimize differential Horseshoe scales. In real-world datasets with unknown
truth, we gen- erate diverse graphs to accommodate user input, addressing
identifiability issues and enhancing interpretability. Evalua- tion on
synthetic and UCI Liver Disorders datasets, using metrics like structural
Hamming Distance and test likelihood, demonstrates our models outperform
state-of-the-art semi- parametric Bayesian Network model.
( 2
min )
We provide a nonasymptotic analysis of the convergence of the stochastic
gradient Hamiltonian Monte Carlo (SGHMC) to a target measure in Wasserstein-2
distance without assuming log-concavity. Our analysis quantifies key
theoretical properties of the SGHMC as a sampler under local conditions which
significantly improves the findings of previous results. In particular, we
prove that the Wasserstein-2 distance between the target and the law of the
SGHMC is uniformly controlled by the step-size of the algorithm, therefore
demonstrate that the SGHMC can provide high-precision results uniformly in the
number of iterations. The analysis also allows us to obtain nonasymptotic
bounds for nonconvex optimization problems under local conditions and implies
that the SGHMC, when viewed as a nonconvex optimizer, converges to a global
minimum with the best known rates. We apply our results to obtain nonasymptotic
bounds for scalable Bayesian inference and nonasymptotic generalization bounds.
( 2
min )
Many machine learning applications require operating on a spatially
distributed dataset. Despite technological advances, privacy considerations and
communication constraints may prevent gathering the entire dataset in a central
unit. In this paper, we propose a distributed sampling scheme based on the
alternating direction method of multipliers, which is commonly used in the
optimization literature due to its fast convergence. In contrast to distributed
optimization, distributed sampling allows for uncertainty quantification in
Bayesian inference tasks. We provide both theoretical guarantees of our
algorithm's convergence and experimental evidence of its superiority to the
state-of-the-art. For our theoretical results, we use convex optimization tools
to establish a fundamental inequality on the generated local sample iterates.
This inequality enables us to show convergence of the distribution associated
with these iterates to the underlying target distribution in Wasserstein
distance. In simulation, we deploy our algorithm on linear and logistic
regression tasks and illustrate its fast convergence compared to existing
gradient-based methods.
( 2
min )
This paper studies the theoretical framework of the alignment process of
generative models with Reinforcement Learning from Human Feedback (RLHF). We
consider a standard mathematical formulation, the reverse-KL regularized
contextual bandit for RLHF. Despite its widespread practical application, a
rigorous theoretical analysis of this formulation remains open. We investigate
its behavior in three distinct settings -- offline, online, and hybrid -- and
propose efficient algorithms with finite-sample theoretical guarantees.
Moving towards practical applications, our framework, with a robust
approximation of the information-theoretical policy improvement oracle,
naturally gives rise to several novel RLHF algorithms. This includes an
iterative version of the Direct Preference Optimization (DPO) algorithm for
online settings, and a multi-step rejection sampling strategy for offline
scenarios. Our empirical evaluations on real-world alignment experiment of
large language model demonstrate that these proposed methods significantly
surpass existing strong baselines, such as DPO and Rejection Sampling
Optimization (RSO), showcasing the connections between solid theoretical
foundations and their powerful practical implementations.
( 2
min )
Multi-distribution learning generalizes the classic PAC learning to handle
data coming from multiple distributions. Given a set of $k$ data distributions
and a hypothesis class of VC dimension $d$, the goal is to learn a hypothesis
that minimizes the maximum population loss over $k$ distributions, up to
$\epsilon$ additive error. In this paper, we settle the sample complexity of
multi-distribution learning by giving an algorithm of sample complexity
$\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}$. This matches the
lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem
of Awasthi, Haghtalab and Zhao [AHZ23].
( 2
min )
In this work, we leverage the intrinsic segmentation of language sequences
and design a new positional encoding method called Bilevel Positional Encoding
(BiPE). For each position, our BiPE blends an intra-segment encoding and an
inter-segment encoding. The intra-segment encoding identifies the locations
within a segment and helps the model capture the semantic information therein
via absolute positional encoding. The inter-segment encoding specifies the
segment index, models the relationships between segments, and aims to improve
extrapolation capabilities via relative positional encoding. Theoretical
analysis shows this disentanglement of positional information makes learning
more effective. The empirical results also show that our BiPE has superior
length extrapolation capabilities across a wide range of tasks in diverse text
modalities.
( 2
min )
The notion of Boolean logic backpropagation was introduced to build neural
networks with weights and activations being Boolean numbers. Most of
computations can be done with Boolean logic instead of real arithmetic, both
during training and inference phases. But the underlying discrete optimization
problem is NP-hard, and the Boolean logic has no guarantee. In this work we
propose the first convergence analysis, under standard non-convex assumptions.
( 2
min )
Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that
aligns language models closely with human-centric values. The initial phase of
RLHF involves learning human values using a reward model from ranking data. It
is observed that the performance of the reward model degrades after one epoch
of training, and optimizing too much against the learned reward model
eventually hinders the true objective. This paper delves into these issues,
leveraging the theoretical insights to design improved reward learning
algorithm termed 'Iterative Data Smoothing' (IDS). The core idea is that during
each training epoch, we not only update the model with the data, but also
update the date using the model, replacing hard labels with soft labels. Our
empirical findings highlight the superior performance of this approach over the
traditional methods.
( 2
min )
This paper develops a new dimension-free Azuma-Hoeffding type bound on
summation norm of a martingale difference sequence with random individual
bounds. With this novel result, we provide high-probability bounds for the
gradient norm estimator in the proposed algorithm Prob-SARAH, which is a
modified version of the StochAstic Recursive grAdient algoritHm (SARAH), a
state-of-art variance reduced algorithm that achieves optimal computational
complexity in expectation for the finite sum problem. The in-probability
complexity by Prob-SARAH matches the best in-expectation result up to
logarithmic factors. Empirical experiments demonstrate the superior
probabilistic performance of Prob-SARAH on real datasets compared to other
popular algorithms.
( 2
min )
Good arm identification (GAI) is a pure-exploration bandit problem in which a
single learner outputs an arm as soon as it is identified as a good arm. A good
arm is defined as an arm with an expected reward greater than or equal to a
given threshold. This paper focuses on the GAI problem under a small threshold
gap, which refers to the distance between the expected rewards of arms and the
given threshold. We propose a new algorithm called lil'HDoC to significantly
improve the total sample complexity of the HDoC algorithm. We demonstrate that
the sample complexity of the first $\lambda$ output arm in lil'HDoC is bounded
by the original HDoC algorithm, except for one negligible term, when the
distance between the expected reward and threshold is small. Extensive
experiments confirm that our algorithm outperforms the state-of-the-art
algorithms in both synthetic and real-world datasets.
( 2
min )
We present new concentration inequalities for either martingale dependent or
exchangeable random symmetric matrices under a variety of tail conditions,
encompassing standard Chernoff bounds to self-normalized heavy-tailed settings.
These inequalities are often randomized in a way that renders them strictly
tighter than existing deterministic results in the literature, are typically
expressed in the Loewner order, and are sometimes valid at arbitrary
data-dependent stopping times.
Along the way, we explore the theory of matrix supermartingales and maximal
inequalities, potentially of independent interest.
( 2
min )
Matrix completion is one of the crucial tools in modern data science
research. Recently, a novel sampling model for matrix completion coined
cross-concentrated sampling (CCS) has caught much attention. However, the
robustness of the CCS model against sparse outliers remains unclear in the
existing studies. In this paper, we aim to answer this question by exploring a
novel Robust CCS Completion problem. A highly efficient non-convex iterative
algorithm, dubbed Robust CUR Completion (RCURC), is proposed. The empirical
performance of the proposed algorithm, in terms of both efficiency and
robustness, is verified in synthetic and real datasets.
( 2
min )
We explore a novel methodology for constructing confidence regions for
parameters of linear models, using predictions from any arbitrary predictor.
Our framework requires minimal assumptions on the noise and can be extended to
functions deviating from strict linearity up to some adjustable threshold,
thereby accommodating a comprehensive and pragmatically relevant set of
functions. The derived confidence regions can be cast as constraints within a
Mixed Integer Linear Programming framework, enabling optimisation of linear
objectives. This representation enables robust optimization and the extraction
of confidence intervals for specific parameter coordinates. Unlike previous
methods, the confidence region can be empty, which can be used for hypothesis
testing. Finally, we validate the empirical applicability of our method on
synthetic data.
( 2
min )
The paper studies the problem of constructing nonparametric simultaneous
confidence bands with nonasymptotic and distribition-free guarantees. The
target function is assumed to be band-limited and the approach is based on the
theory of Paley-Wiener reproducing kernel Hilbert spaces. The starting point of
the paper is a recently developed algorithm to which we propose three types of
improvements. First, we relax the assumptions on the noises by replacing the
symmetricity assumption with a weaker distributional invariance principle.
Then, we propose a more efficient way to estimate the norm of the target
function, and finally we enhance the construction of the confidence bands by
tightening the constraints of the underlying convex optimization problems. The
refinements are also illustrated through numerical experiments.
( 2
min )
Data analysis often requires methods that are invariant with respect to
specific transformations, such as rotations in case of images or shifts in case
of images and time series. While principal component analysis (PCA) is a
widely-used dimension reduction technique, it lacks robustness with respect to
these transformations. Modern alternatives, such as autoencoders, can be
invariant with respect to specific transformations but are generally not
interpretable. We introduce General Transform-Invariant Principal Component
Analysis (GT-PCA) as an effective and interpretable alternative to PCA and
autoencoders. We propose a neural network that efficiently estimates the
components and show that GT-PCA significantly outperforms alternative methods
in experiments based on synthetic and real data.
( 2
min )
Logistic regression is a ubiquitous method for probabilistic classification.
However, the effectiveness of logistic regression depends upon careful and
relatively computationally expensive tuning, especially for the regularisation
hyperparameter, and especially in the context of high-dimensional data. We
present a prevalidated ridge regression model that closely matches logistic
regression in terms of classification error and log-loss, particularly for
high-dimensional data, while being significantly more computationally efficient
and having effectively no hyperparameters beyond regularisation. We scale the
coefficients of the model so as to minimise log-loss for a set of prevalidated
predictions derived from the estimated leave-one-out cross-validation error.
This exploits quantities already computed in the course of fitting the ridge
regression model in order to find the scaling parameter with nominal additional
computational expense.
( 2
min )
Pre-training is known to generate universal representations for downstream
tasks in large-scale deep learning such as large language models. Existing
literature, e.g., \cite{kim2020adversarial}, empirically observe that the
downstream tasks can inherit the adversarial robustness of the pre-trained
model. We provide theoretical justifications for this robustness inheritance
phenomenon. Our theoretical results reveal that feature purification plays an
important role in connecting the adversarial robustness of the pre-trained
model and the downstream tasks in two-layer neural networks. Specifically, we
show that (i) with adversarial training, each hidden node tends to pick only
one (or a few) feature; (ii) without adversarial training, the hidden nodes can
be vulnerable to attacks. This observation is valid for both supervised
pre-training and contrastive learning. With purified nodes, it turns out that
clean training is enough to achieve adversarial robustness in downstream tasks.
( 2
min )
When generative AI is given a prompt to display an image in a certain way or style, what it also means is telling AI to imagine. The request to imagine is an acknowledgment that it has a will to do so, not just the capability [or the possession of contents] to do so. This will… Read More »GenAI regulation: Are deepfakes indicative of free will in LLMs?
The post GenAI regulation: Are deepfakes indicative of free will in LLMs? appeared first on Data Science Central.
( 22
min )
A podcast with CEO Ricky Sun of Ultipa Image by Gerd Altmann from Pixabay Relationship-rich graph structures can be quite complex and resource consuming to process at scale when using conventional technology. This is particularly the case when it comes to searches that demand the computation to reach 30 hops or more into the graphs. … Read More »High-performance computing’s role in real-time graph analytics
The post High-performance computing’s role in real-time graph analytics appeared first on Data Science Central.
( 20
min )
With the advent of generative AI, today’s foundation models (FMs), such as the large language models (LLMs) Claude 2 and Llama 2, can perform a range of generative tasks such as question answering, summarization, and content creation on text data. However, real-world data exists in multiple modalities, such as text, images, video, and audio. Take […]
( 12
min )
Microsoft announces the AFMR Minority Serving Institutions grant recipients, advancing AI research focused on today’s most significant technical and societal challenges. The grant provides funding and access to Azure-hosted foundation models.
The post Announcing recipients of the AFMR Minority Serving Institutions grant appeared first on Microsoft Research.
( 8
min )
This week’s featured In the NVIDIA Studio 3D artist Brandon Tieh puts his artistic talents on full display with his whimsical scene “Magic Valley.”
( 7
min )
Counterfactual explanations, and their associated algorithmic recourse, are
typically leveraged to understand, explain, and potentially alter a prediction
coming from a black-box classifier. In this paper, we propose to extend the use
of counterfactuals to evaluate progress in sequential decision making tasks. To
this end, we introduce a model-agnostic modular framework, TraCE (Trajectory
Counterfactual Explanation) scores, which is able to distill and condense
progress in highly complex scenarios into a single value. We demonstrate
TraCE's utility across domains by showcasing its main properties in two case
studies spanning healthcare and climate change.
( 2
min )
Markov processes are widely used mathematical models for describing dynamic
systems in various fields. However, accurately simulating large-scale systems
at long time scales is computationally expensive due to the short time steps
required for accurate integration. In this paper, we introduce an inference
process that maps complex systems into a simplified representational space and
models large jumps in time. To achieve this, we propose Time-lagged Information
Bottleneck (T-IB), a principled objective rooted in information theory, which
aims to capture relevant temporal features while discarding high-frequency
information to simplify the simulation task and minimize the inference error.
Our experiments demonstrate that T-IB learns information-optimal
representations for accurately modeling the statistical properties and dynamics
of the original process at a selected time lag, outperforming existing
time-lagged dimensionality reduction methods.
( 2
min )
We consider the task of estimating functions belonging to a specific class of
nonsmooth functions, namely so-called tame functions. These functions appear in
a wide range of applications: training deep learning, value functions of
mixed-integer programs, or wave functions of small molecules. We show that tame
functions are approximable by piecewise polynomials on any full-dimensional
cube. We then present the first ever mixed-integer programming formulation of
piecewise polynomial regression. Together, these can be used to estimate tame
functions. We demonstrate promising computational results.
( 2
min )
With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are
increasing interests in distilling the capabilies of close-sourced LLMs to
smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT
to generate a set of instructions and answers, for the student model to learn.
However, such standard distillation approach neglects the merits and conditions
of the student model. Inspired by modern teaching principles, we design a
personalised distillation process, in which the student attempts to solve a
task first, then the teacher provides an adaptive refinement for the student to
improve. Instead of feeding the student with teacher's prior, personalised
distillation enables personalised learning for the student model, as it only
learns on examples it makes mistakes upon and learns to improve its own
solution. On code generation, personalised distillation consistently
outperforms standard distillation with only one third of the data. With only
2.5-3K personalised examples that incur a data-collection cost of 4-6$, we
boost CodeGen-mono-16B by 7% to achieve 36.4% pass@1 and StarCoder by 12.2% to
achieve 45.8% pass@1 on HumanEval.
( 2
min )
The performance of data fusion and tracking algorithms often depends on
parameters that not only describe the sensor system, but can also be
task-specific. While for the sensor system tuning these variables is
time-consuming and mostly requires expert knowledge, intrinsic parameters of
targets under track can even be completely unobservable until the system is
deployed. With state-of-the-art sensor systems growing more and more complex,
the number of parameters naturally increases, necessitating the automatic
optimization of the model variables. In this paper, the parameters of an
interacting multiple model (IMM) filter are optimized solely using
measurements, thus without necessity for any ground-truth data. The resulting
method is evaluated through an ablation study on simulated data, where the
trained model manages to match the performance of a filter parametrized with
ground-truth values.
( 2
min )
Training offline reinforcement learning (RL) models using visual inputs poses
two significant challenges, i.e., the overfitting problem in representation
learning and the overestimation bias for expected future rewards. Recent work
has attempted to alleviate the overestimation bias by encouraging conservative
behaviors. This paper, in contrast, tries to build more flexible constraints
for value estimation without impeding the exploration of potential advantages.
The key idea is to leverage off-the-shelf RL simulators, which can be easily
interacted with in an online manner, as the "test bed" for offline policies. To
enable effective online-to-offline knowledge transfer, we introduce CoWorld, a
model-based RL approach that mitigates cross-domain discrepancies in state and
reward spaces. Experimental results demonstrate the effectiveness of CoWorld,
outperforming existing RL approaches by large margins.
( 2
min )
The age and stroke-associated decline in musculoskeletal strength degrades
the ability to perform daily human tasks using the upper extremities. Although
there are a few examples of exoskeletons, they need manual operations due to
the absence of sensor feedback and no intention prediction of movements. Here,
we introduce an intelligent upper-limb exoskeleton system that uses cloud-based
deep learning to predict human intention for strength augmentation. The
embedded soft wearable sensors provide sensory feedback by collecting real-time
muscle signals, which are simultaneously computed to determine the user's
intended movement. The cloud-based deep-learning predicts four upper-limb joint
motions with an average accuracy of 96.2% at a 200-250 millisecond response
rate, suggesting that the exoskeleton operates just by human intention. In
addition, an array of soft pneumatics assists the intended movements by
providing 897 newton of force and 78.7 millimeter of displacement at maximum.
Collectively, the intent-driven exoskeleton can augment human strength by 5.15
times on average compared to the unassisted exoskeleton. This report
demonstrates an exoskeleton robot that augments the upper-limb joint movements
by human intention based on a machine-learning cloud computing and sensory
feedback.
( 3
min )
Contrastive self-supervised learning has gained attention for its ability to
create high-quality representations from large unlabelled data sets. A key
reason that these powerful features enable data-efficient learning of
downstream tasks is that they provide augmentation invariance, which is often a
useful inductive bias. However, the amount and type of invariances preferred is
not known apriori, and varies across different downstream tasks. We therefore
propose a multi-task self-supervised framework (MT-SLVR) that learns both
variant and invariant features in a parameter-efficient manner. Our multi-task
representation provides a strong and flexible feature that benefits diverse
downstream tasks. We evaluate our approach on few-shot classification tasks
drawn from a variety of audio domains and demonstrate improved classification
performance on all of them
( 2
min )
Low-rank matrix completion consists of computing a matrix of minimal
complexity that recovers a given set of observations as accurately as possible.
Unfortunately, existing methods for matrix completion are heuristics that,
while highly scalable and often identifying high-quality solutions, do not
possess any optimality guarantees. We reexamine matrix completion with an
optimality-oriented eye. We reformulate these low-rank problems as convex
problems over the non-convex set of projection matrices and implement a
disjunctive branch-and-bound scheme that solves them to certifiable optimality.
Further, we derive a novel and often tight class of convex relaxations by
decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing
that two-by-two minors in each rank-one matrix have determinant zero. In
numerical experiments, our new convex relaxations decrease the optimality gap
by two orders of magnitude compared to existing attempts, and our disjunctive
branch-and-bound scheme solves nxn rank-r matrix completion problems to
certifiable optimality in hours for n<=150 and r<=5.
( 2
min )
In this study, we harness the information-theoretic Privacy Funnel (PF) model
to develop a method for privacy-preserving representation learning using an
end-to-end training framework. We rigorously address the trade-off between
obfuscation and utility. Both are quantified through the logarithmic loss, a
measure also recognized as self-information loss. This exploration deepens the
interplay between information-theoretic privacy and representation learning,
offering substantive insights into data protection mechanisms for both
discriminative and generative models. Importantly, we apply our model to
state-of-the-art face recognition systems. The model demonstrates adaptability
across diverse inputs, from raw facial images to both derived or refined
embeddings, and is competent in tasks such as classification, reconstruction,
and generation.
( 2
min )
We present the first $\varepsilon$-differentially private, computationally
efficient algorithm that estimates the means of product distributions over
$\{0,1\}^d$ accurately in total-variation distance, whilst attaining the
optimal sample complexity to within polylogarithmic factors. The prior work had
either solved this problem efficiently and optimally under weaker notions of
privacy, or had solved it optimally while having exponential running times.
( 2
min )
We propose an approach for continuous prediction of turn-taking and
backchanneling locations in spoken dialogue by fusing a neural acoustic model
with a large language model (LLM). Experiments on the Switchboard human-human
conversation dataset demonstrate that our approach consistently outperforms the
baseline models with single modality. We also develop a novel multi-task
instruction fine-tuning strategy to further benefit from LLM-encoded knowledge
for understanding the tasks and conversational contexts, leading to additional
improvements. Our approach demonstrates the potential of combined LLMs and
acoustic models for a more natural and conversational interaction between
humans and speech-enabled AI agents.
( 2
min )
We study the geometry of linear networks with one-dimensional convolutional
layers. The function spaces of these networks can be identified with
semi-algebraic families of polynomials admitting sparse factorizations. We
analyze the impact of the network's architecture on the function space's
dimension, boundary, and singular points. We also describe the critical points
of the network's parameterization map. Furthermore, we study the optimization
problem of training a network with the squared error loss. We prove that for
architectures where all strides are larger than one and generic data, the
non-zero critical points of that optimization problem are smooth interior
points of the function space. This property is known to be false for dense
linear networks and linear convolutional networks with stride one.
( 2
min )
Performing classification on noisy, crowdsourced image datasets can prove
challenging even for the best neural networks. Two issues which complicate the
problem on such datasets are class imbalance and ground-truth uncertainty in
labeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped,
individual characters from images of ancient Greek papyri - are strongly
affected by both issues. The application of ensemble modeling to such datasets
can help identify images where the ground-truth is questionable and quantify
the trustworthiness of those samples. As such, we apply stacked generalization
consisting of nearly identical ResNets with different loss functions: one
utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence
(KLD). Both networks use labels drawn from a crowd-sourced consensus. This
consensus is derived from a Normalized Distribution of Annotations (NDA) based
on all annotations for a given character in the dataset. For the second
network, the KLD is calculated with respect to the NDA. For our ensemble model,
we apply a k-nearest neighbors model to the outputs of the CXE and KLD
networks. Individually, the ResNet models have approximately 93% accuracy,
while the ensemble model achieves an accuracy of > 95%, increasing the
classification trustworthiness. We also perform an analysis of the Shannon
entropy of the various models' output distributions to measure classification
uncertainty. Our results suggest that entropy is useful for predicting model
misclassifications.
( 3
min )
In this paper, we study the expressivity of scalar, Markovian reward
functions in Reinforcement Learning (RL), and identify several limitations to
what they can express. Specifically, we look at three classes of RL tasks;
multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive
necessary and sufficient conditions that describe when a problem in this class
can be expressed using a scalar, Markovian reward. Moreover, we find that
scalar, Markovian rewards are unable to express most of the instances in each
of these three classes. We thereby contribute to a more complete understanding
of what standard reward functions can and cannot express. In addition to this,
we also call attention to modal problems as a new class of problems, since they
have so far not been given any systematic treatment in the RL literature. We
also briefly outline some approaches for solving some of the problems we
discuss, by means of bespoke RL algorithms.
( 2
min )
This paper introduces a novel approach to enumerate and assess Trapping sets
in quasi-cyclic codes, those with circulant sizes that are non-prime numbers.
Leveraging the quasi-cyclic properties, the method employs a tabular technique
to streamline the importance sampling step for estimating the pseudo-codeword
weight of Trapping sets. The presented methodology draws on the mathematical
framework established in the provided theorem, which elucidates the behavior of
projection and lifting transformations on pseudo-codewords
( 2
min )
The validation of global climate models is crucial to ensure the accuracy and
efficacy of model output. We introduce the spherical convolutional Wasserstein
distance to more comprehensively measure differences between climate models and
reanalysis data. This new similarity measure accounts for spatial variability
using convolutional projections and quantifies local differences in the
distribution of climate variables. We apply this method to evaluate the
historical model outputs of the Coupled Model Intercomparison Project (CMIP)
members by comparing them to observational and reanalysis data products.
Additionally, we investigate the progression from CMIP phase 5 to phase 6 and
find modest improvements in the phase 6 models regarding their ability to
produce realistic climatologies.
( 2
min )
Much of the research in differential privacy has focused on offline
applications with the assumption that all data is available at once. When these
algorithms are applied in practice to streams where data is collected over
time, this either violates the privacy guarantees or results in poor utility.
We derive an algorithm for differentially private synthetic streaming data
generation, especially curated towards spatial datasets. Furthermore, we
provide a general framework for online selective counting among a collection of
queries which forms a basis for many tasks such as query answering and
synthetic data generation. The utility of our algorithm is verified on both
real-world and simulated datasets.
( 2
min )
One problem with researching cognitive modeling and reinforcement learning
(RL) is that researchers spend too much time on setting up an appropriate
computational framework for their experiments. Many open source implementations
of current RL algorithms exist, but there is a lack of a modular suite of tools
combining different robotic simulators and platforms, data visualization,
hyperparameter optimization, and baseline experiments. To address this problem,
we present Scilab-RL, a software framework for efficient research in cognitive
modeling and reinforcement learning for robotic agents. The framework focuses
on goal-conditioned reinforcement learning using Stable Baselines 3 and the
OpenAI gym interface. It enables native possibilities for experiment
visualizations and hyperparameter optimization. We describe how these features
enable researchers to conduct experiments with minimal time effort, thus
maximizing research output.
( 2
min )
Partial differential equations (PDEs) are commonly employed to model complex
industrial systems characterized by multivariable dependence. Existing
physics-informed neural networks (PINNs) excel in solving PDEs in a homogeneous
medium. However, their feasibility is diminished when PDE parameters are
unknown due to a lack of physical attributions and time-varying interface is
unavailable arising from heterogeneous media. To this end, we propose a
data-physics-hybrid method, physically informed synchronic-adaptive learning
(PISAL), to solve PDEs for industrial systems modeling in heterogeneous media.
First, Net1, Net2, and NetI, are constructed to approximate the solutions
satisfying PDEs and the interface. Net1 and Net2 are utilized to synchronously
learn each solution satisfying PDEs with diverse parameters, while NetI is
employed to adaptively learn the unavailable time-varying interface. Then, a
criterion combined with NetI is introduced to adaptively distinguish the
attributions of measurements and collocation points. Furthermore, NetI is
integrated into a data-physics-hybrid loss function. Accordingly, a
synchronic-adaptive learning (SAL) strategy is proposed to decompose and
optimize each subdomain. Besides, we theoretically prove the approximation
capability of PISAL. Extensive experimental results verify that the proposed
PISAL can be used for industrial systems modeling in heterogeneous media, which
faces the challenges of lack of physical attributions and unavailable
time-varying interface.
( 2
min )
Prompt design and engineering has become an important discipline in just the
past few months. In this paper, we provide an introduction to the main concepts
as well as review basic and more advanced approaches to prompt design and
engineering.
( 2
min )
We investigate the problem of learning Linear Quadratic Regulators (LQR) in a
multi-task, heterogeneous, and model-free setting. We characterize the
stability and personalization guarantees of a Policy Gradient-based (PG)
Model-Agnostic Meta-Learning (MAML) (Finn et al., 2017) approach for the LQR
problem under different task-heterogeneity settings. We show that the MAML-LQR
approach produces a stabilizing controller close to each task-specific optimal
controller up to a task-heterogeneity bias for both model-based and model-free
settings. Moreover, in the model-based setting, we show that this controller is
achieved with a linear convergence rate, which improves upon sub-linear rates
presented in existing MAML-LQR work. In contrast to existing MAML-LQR results,
our theoretical guarantees demonstrate that the learned controller can
efficiently adapt to unseen LQR tasks.
( 2
min )
Machine learning is about forecasting. Forecasts, however, obtain their
usefulness only through their evaluation. Machine learning has traditionally
focused on types of losses and their corresponding regret. Currently, the
machine learning community regained interest in calibration. In this work, we
show the conceptual equivalence of calibration and regret in evaluating
forecasts. We frame the evaluation problem as a game between a forecaster, a
gambler and nature. Putting intuitive restrictions on gambler and forecaster,
calibration and regret naturally fall out of the framework. In addition, this
game links evaluation of forecasts to randomness of outcomes. Random outcomes
with respect to forecasts are equivalent to good forecasts with respect to
outcomes. We call those dual aspects, calibration and regret, predictiveness
and randomness, the four facets of forecast felicity.
( 2
min )
We present a manifold-based autoencoder method for learning nonlinear
dynamics in time, notably partial differential equations (PDEs), in which the
manifold latent space evolves according to Ricci flow. This can be accomplished
by simulating Ricci flow in a physics-informed setting, and manifold quantities
can be matched so that Ricci flow is empirically achieved. With our
methodology, the manifold is learned as part of the training procedure, so
ideal geometries may be discerned, while the evolution simultaneously induces a
more accommodating latent representation over static methods. We present our
method on a range of numerical experiments consisting of PDEs that encompass
desirable characteristics such as periodicity and randomness, remarking error
on in-distribution and extrapolation scenarios.
( 2
min )
Kalman filters provide a straightforward and interpretable means to estimate
hidden or latent variables, and have found numerous applications in control,
robotics, signal processing, and machine learning. One such application is
neural decoding for neuroprostheses. In 2020, Burkhart et al. thoroughly
evaluated their new version of the Kalman filter that leverages Bayes' theorem
to improve filter performance for highly non-linear or non-Gaussian observation
models. This work provides an open-source Python alternative to the authors'
MATLAB algorithm. Specifically, we reproduce their most salient results for
neuroscientific contexts and further examine the efficacy of their filter using
multiple random seeds and previously unused trials from the authors' dataset.
All experiments were performed offline on a single computer.
( 2
min )
This paper serves as a comprehensive system description of version 2.0 of the
Marabou framework for formal analysis of neural networks. We discuss the tool's
architectural design and highlight the major features and components introduced
since its initial release.
( 2
min )
Low-rank matrix completion consists of computing a matrix of minimal
complexity that recovers a given set of observations as accurately as possible.
Unfortunately, existing methods for matrix completion are heuristics that,
while highly scalable and often identifying high-quality solutions, do not
possess any optimality guarantees. We reexamine matrix completion with an
optimality-oriented eye. We reformulate these low-rank problems as convex
problems over the non-convex set of projection matrices and implement a
disjunctive branch-and-bound scheme that solves them to certifiable optimality.
Further, we derive a novel and often tight class of convex relaxations by
decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing
that two-by-two minors in each rank-one matrix have determinant zero. In
numerical experiments, our new convex relaxations decrease the optimality gap
by two orders of magnitude compared to existing attempts, and our disjunctive
branch-and-bound scheme solves nxn rank-r matrix completion problems to
certifiable optimality in hours for n<=150 and r<=5.
( 2
min )
We present the first $\varepsilon$-differentially private, computationally
efficient algorithm that estimates the means of product distributions over
$\{0,1\}^d$ accurately in total-variation distance, whilst attaining the
optimal sample complexity to within polylogarithmic factors. The prior work had
either solved this problem efficiently and optimally under weaker notions of
privacy, or had solved it optimally while having exponential running times.
( 2
min )
Gradient Langevin dynamics and a variety of its variants have attracted
increasing attention owing to their convergence towards the global optimal
solution, initially in the unconstrained convex framework while recently even
in convex constrained non-convex problems. In the present work, we extend those
frameworks to non-convex problems on a non-convex feasible region with a global
optimization algorithm built upon reflected gradient Langevin dynamics and
derive its convergence rates. By effectively making use of its reflection at
the boundary in combination with the probabilistic representation for the
Poisson equation with the Neumann boundary condition, we present promising
convergence rates, particularly faster than the existing one for convex
constrained non-convex problems.
( 2
min )
We consider the community detection problem in a sparse $q$-uniform
hypergraph $G$, assuming that $G$ is generated according to the Hypergraph
Stochastic Block Model (HSBM). We prove that a spectral method based on the
non-backtracking operator for hypergraphs works with high probability down to
the generalized Kesten-Stigum detection threshold conjectured by Angelini et
al. (2015). We characterize the spectrum of the non-backtracking operator for
the sparse HSBM and provide an efficient dimension reduction procedure using
the Ihara-Bass formula for hypergraphs. As a result, community detection for
the sparse HSBM on $n$ vertices can be reduced to an eigenvector problem of a
$2n\times 2n$ non-normal matrix constructed from the adjacency matrix and the
degree matrix of the hypergraph. To the best of our knowledge, this is the
first provable and efficient spectral algorithm that achieves the conjectured
threshold for HSBMs with $r$ blocks generated according to a general symmetric
probability tensor.
( 2
min )
The validation of global climate models is crucial to ensure the accuracy and
efficacy of model output. We introduce the spherical convolutional Wasserstein
distance to more comprehensively measure differences between climate models and
reanalysis data. This new similarity measure accounts for spatial variability
using convolutional projections and quantifies local differences in the
distribution of climate variables. We apply this method to evaluate the
historical model outputs of the Coupled Model Intercomparison Project (CMIP)
members by comparing them to observational and reanalysis data products.
Additionally, we investigate the progression from CMIP phase 5 to phase 6 and
find modest improvements in the phase 6 models regarding their ability to
produce realistic climatologies.
( 2
min )
Numerous robust estimators exist as alternatives to the maximum likelihood
estimator (MLE) when a completely observed ground-up loss severity sample
dataset is available. However, the options for robust alternatives to MLE
become significantly limited when dealing with grouped loss severity data, with
only a handful of methods like least squares, minimum Hellinger distance, and
optimal bounded influence function available. This paper introduces a novel
robust estimation technique, the Method of Truncated Moments (MTuM),
specifically designed to estimate the tail index of a Pareto distribution from
grouped data. Inferential justification of MTuM is established by employing the
central limit theorem and validating them through a comprehensive simulation
study.
( 2
min )
Machine learning is about forecasting. Forecasts, however, obtain their
usefulness only through their evaluation. Machine learning has traditionally
focused on types of losses and their corresponding regret. Currently, the
machine learning community regained interest in calibration. In this work, we
show the conceptual equivalence of calibration and regret in evaluating
forecasts. We frame the evaluation problem as a game between a forecaster, a
gambler and nature. Putting intuitive restrictions on gambler and forecaster,
calibration and regret naturally fall out of the framework. In addition, this
game links evaluation of forecasts to randomness of outcomes. Random outcomes
with respect to forecasts are equivalent to good forecasts with respect to
outcomes. We call those dual aspects, calibration and regret, predictiveness
and randomness, the four facets of forecast felicity.
( 2
min )
We present a manifold-based autoencoder method for learning nonlinear
dynamics in time, notably partial differential equations (PDEs), in which the
manifold latent space evolves according to Ricci flow. This can be accomplished
by simulating Ricci flow in a physics-informed setting, and manifold quantities
can be matched so that Ricci flow is empirically achieved. With our
methodology, the manifold is learned as part of the training procedure, so
ideal geometries may be discerned, while the evolution simultaneously induces a
more accommodating latent representation over static methods. We present our
method on a range of numerical experiments consisting of PDEs that encompass
desirable characteristics such as periodicity and randomness, remarking error
on in-distribution and extrapolation scenarios.
( 2
min )
Kalman filters provide a straightforward and interpretable means to estimate
hidden or latent variables, and have found numerous applications in control,
robotics, signal processing, and machine learning. One such application is
neural decoding for neuroprostheses. In 2020, Burkhart et al. thoroughly
evaluated their new version of the Kalman filter that leverages Bayes' theorem
to improve filter performance for highly non-linear or non-Gaussian observation
models. This work provides an open-source Python alternative to the authors'
MATLAB algorithm. Specifically, we reproduce their most salient results for
neuroscientific contexts and further examine the efficacy of their filter using
multiple random seeds and previously unused trials from the authors' dataset.
All experiments were performed offline on a single computer.
( 2
min )
When deploying a large language model (LLM), machine learning (ML) practitioners typically care about two measurements for model serving performance: latency, defined by the time it takes to generate a single token, and throughput, defined by the number of tokens generated per second. Although a single request to the deployed endpoint would exhibit a throughput […]
( 22
min )
Hip disorders, comprising some of the world’s most common joint diseases, are especially prevalent among adolescents and young adults, causing stiffness, pain or a limp. But they can be hard to diagnose using solely 2D medical imaging. Helping to treat these disorders, the Boston Children’s Hospital’s (BCH’s) Adolescent and Young Adult Hip Preservation Program is Read article >
( 6
min )
A model on its own is typically not enough. It requires the data, which comes in a very specific format and has to be the same format that will be used at the time of inference or prediction.
The post From MLOps to LLMOps— and hardware headaches ahead appeared first on Data Science Central.
( 22
min )
This article explores the versatile applications of healthcare chatbots, shedding light on their transformative impact on patient care and medical processes.
The post Revolutionizing healthcare with chatbots: A humanized exploration appeared first on Data Science Central.
( 20
min )
Swarms of autonomous interactive drones, with the support of recharging
technology, can provide compelling sensing capabilities in Smart Cities, such
as traffic monitoring and disaster response. Existing approaches, including
distributed optimization and deep reinforcement learning (DRL), aim to
coordinate drones to achieve cost-effective, high-quality navigation, sensing,
and charging. However, they face grand challenges: short-term optimization is
not effective in dynamic environments with unanticipated changes, while
long-term learning lacks scalability, resilience, and flexibility. To bridge
this gap, this paper introduces a new progressive approach that combines
short-term plan generation and selection based on distributed optimization with
a DRL-based long-term strategic scheduling of flying direction. Extensive
experimentation with datasets generated from realistic urban mobility
underscores an outstanding performance of the proposed solution compared to
state-of-the-art. We also provide compelling new insights about the role of
drones density in different sensing missions, the energy safety of drone
operations and how to prioritize investments for key locations of charging
infrastructure.
( 2
min )
The recent introduction of the Least-Squares Support Vector Regression
(LS-SVR) algorithm for solving differential and integral equations has sparked
interest. In this study, we expand the application of this algorithm to address
systems of differential-algebraic equations (DAEs). Our work presents a novel
approach to solving general DAEs in an operator format by establishing
connections between the LS-SVR machine learning model, weighted residual
methods, and Legendre orthogonal polynomials. To assess the effectiveness of
our proposed method, we conduct simulations involving various DAE scenarios,
such as nonlinear systems, fractional-order derivatives, integro-differential,
and partial DAEs. Finally, we carry out comparisons between our proposed method
and currently established state-of-the-art approaches, demonstrating its
reliability and effectiveness.
( 2
min )
The generation of undesirable and factually incorrect content of large
language models poses a significant challenge and remains largely an unsolved
issue. This paper studies the integration of a contrastive learning objective
for fine-tuning LLMs for implicit knowledge editing and controlled text
generation. Optimizing the training objective entails aligning text
perplexities in a contrastive fashion. To facilitate training the model in a
self-supervised fashion, we leverage an off-the-shelf LLM for training data
generation. We showcase applicability in the domain of detoxification. Herein,
the proposed approach leads to a significant decrease in the generation of
toxic content while preserving general utility for downstream tasks such as
commonsense reasoning and reading comprehension. The proposed approach is
conceptually simple but empirically powerful.
( 2
min )
Electronic Health Record (EHR) data, while rich in information, often suffers
from sparsity, posing significant challenges in predictive modeling.
Traditional imputation methods inadequately distinguish between real and
imputed data, leading to potential inaccuracies in models. Addressing this, we
introduce PRISM, a novel approach that indirectly imputes data through
prototype representations of similar patients, thus ensuring denser and more
accurate embeddings. PRISM innovates further with a feature confidence learner
module, which evaluates the reliability of each feature in light of missing
data. Additionally, it incorporates a novel patient similarity metric that
accounts for feature confidence, avoiding overreliance on imprecise imputed
values. Our extensive experiments on the MIMIC-III and MIMIC-IV datasets
demonstrate PRISM's superior performance in predicting in-hospital mortality
and 30-day readmission tasks, showcasing its effectiveness in handling EHR data
sparsity. For the sake of reproducibility and further research, we have made
the code publicly available at https://github.com/yhzhu99/PRISM.
( 2
min )
"You never forget how to ride a bike", -- but how is that possible? The brain
is able to learn complex skills, stop the practice for years, learn other
skills in between, and still retrieve the original knowledge when necessary.
The mechanisms of this capability, referred to as lifelong learning (or
continual learning, CL), are unknown. We suggest a bio-plausible
meta-plasticity rule building on classical work in CL which we summarize in two
principles: (i) neurons are context selective, and (ii) a local availability
variable partially freezes the plasticity if the neuron was relevant for
previous tasks. In a new neuro-centric formalization of these principles, we
suggest that neuron selectivity and neuron-wide consolidation is a simple and
viable meta-plasticity hypothesis to enable CL in the brain. In simulation,
this simple model balances forgetting and consolidation leading to better
transfer learning than contemporary CL algorithms on image recognition and
natural language processing CL benchmarks.
( 2
min )
Edge Intelligence (EI) integrates Edge Computing (EC) and Artificial
Intelligence (AI) to push the capabilities of AI to the network edge for
real-time, efficient and secure intelligent decision-making and computation.
However, EI faces various challenges due to resource constraints, heterogeneous
network environments, and diverse service requirements of different
applications, which together affect the trustworthiness of EI in the eyes of
stakeholders. This survey comprehensively summarizes the characteristics,
architecture, technologies, and solutions of trustworthy EI. Specifically, we
first emphasize the need for trustworthy EI in the context of the trend toward
large models. We then provide an initial definition of trustworthy EI, explore
its key characteristics and give a multi-layered architecture for trustworthy
EI. Then, we summarize several important issues that hinder the achievement of
trustworthy EI. Subsequently, we present enabling technologies for trustworthy
EI systems and provide an in-depth literature review of the state-of-the-art
solutions for realizing the trustworthiness of EI. Finally, we discuss the
corresponding research challenges and open issues.
( 2
min )
Depression is a global burden and one of the most challenging mental health
conditions to control. Experts can detect its severity early using the Beck
Depression Inventory (BDI) questionnaire, administer appropriate medication to
patients, and impede its progression. Due to the fear of potential
stigmatization, many patients turn to social media platforms like Reddit for
advice and assistance at various stages of their journey. This research
extracts text from Reddit to facilitate the diagnostic process. It employs a
proposed labeling approach to categorize the text and subsequently fine-tunes
the Longformer model. The model's performance is compared against baseline
models, including Naive Bayes, Random Forest, Support Vector Machines, and
Gradient Boosting. Our findings reveal that the Longformer model outperforms
the baseline models in both English (48%) and Luganda (45%) languages on a
custom-made dataset.
( 2
min )
Label-free cell classification is advantageous for supplying pristine cells
for further use or examination, yet existing techniques frequently fall short
in terms of specificity and speed. In this study, we address these limitations
through the development of a novel machine learning framework, Multiplex Image
Machine Learning (MIML). This architecture uniquely combines label-free cell
images with biomechanical property data, harnessing the vast, often
underutilized morphological information intrinsic to each cell. By integrating
both types of data, our model offers a more holistic understanding of the
cellular properties, utilizing morphological information typically discarded in
traditional machine learning models. This approach has led to a remarkable
98.3\% accuracy in cell classification, a substantial improvement over models
that only consider a single data type. MIML has been proven effective in
classifying white blood cells and tumor cells, with potential for broader
application due to its inherent flexibility and transfer learning capability.
It's particularly effective for cells with similar morphology but distinct
biomechanical properties. This innovative approach has significant implications
across various fields, from advancing disease diagnostics to understanding
cellular behavior.
( 3
min )
Due to the continuous change in operational data, AIOps solutions suffer from
performance degradation over time. Although periodic retraining is the
state-of-the-art technique to preserve the failure prediction AIOps models'
performance over time, this technique requires a considerable amount of labeled
data to retrain. In AIOps obtaining label data is expensive since it requires
the availability of domain experts to intensively annotate it. In this paper,
we present McUDI, a model-centric unsupervised degradation indicator that is
capable of detecting the exact moment the AIOps model requires retraining as a
result of changes in data. We further show how employing McUDI in the
maintenance pipeline of AIOps solutions can reduce the number of samples that
require annotations with 30k for job failure prediction and 260k for disk
failure prediction while achieving similar performance with periodic
retraining.
( 2
min )
Speech foundation models (SFMs) have been benchmarked on many speech
processing tasks, often achieving state-of-the-art performance with minimal
adaptation. However, the SFM paradigm has been significantly less explored for
applications of interest to the speech perception community. In this paper we
present a systematic evaluation of 10 SFMs on one such application: Speech
intelligibility prediction. We focus on the non-intrusive setup of the Clarity
Prediction Challenge 2 (CPC2), where the task is to predict the percentage of
words correctly perceived by hearing-impaired listeners from speech-in-noise
recordings. We propose a simple method that learns a lightweight specialized
prediction head on top of frozen SFMs to approach the problem. Our results
reveal statistically significant differences in performance across SFMs. Our
method resulted in the winning submission in the CPC2, demonstrating its
promise for speech perception applications.
( 2
min )
There is an evident lack of implementation of Machine Learning (ML) in the
legal domain in India, and any research that does take place in this domain is
usually based on data from the higher courts of law and works with English
data. The lower courts and data from the different regional languages of India
are often overlooked. In this paper, we deploy a Convolutional Neural Network
(CNN) architecture on a corpus of Hindi legal documents. We perform a bail
Prediction task with the help of a CNN model and achieve an overall accuracy of
93\% which is an improvement on the benchmark accuracy, set by Kapoor et al.
(2022), albeit in data from 20 districts of the Indian state of Uttar Pradesh.
( 2
min )
We propose a novel algorithm for the support estimation of partially known
Gaussian graphical models that incorporates prior information about the
underlying graph. In contrast to classical approaches that provide a point
estimate based on a maximum likelihood or a maximum a posteriori criterion
using (simple) priors on the precision matrix, we consider a prior on the graph
and rely on annealed Langevin diffusion to generate samples from the posterior
distribution. Since the Langevin sampler requires access to the score function
of the underlying graph prior, we use graph neural networks to effectively
estimate the score from a graph dataset (either available beforehand or
generated from a known distribution). Numerical experiments demonstrate the
benefits of our approach.
( 2
min )
Early diagnosis of Alzheimer Diagnostics (AD) is a challenging task due to
its subtle and complex clinical symptoms. Deep learning-assisted medical
diagnosis using image recognition techniques has become an important research
topic in this field. The features have to accurately capture main variations of
anatomical brain structures. However, time-consuming is expensive for feature
extraction by deep learning training. This study proposes a novel Alzheimer's
disease detection model based on Convolutional Neural Networks. The model
utilizes a pre-trained ResNet network as the backbone, incorporating
post-fusion algorithm for 3D medical images and attention mechanisms. The
experimental results indicate that the employed 2D fusion algorithm effectively
improves the model's training expense. And the introduced attention mechanism
accurately weights important regions in images, further enhancing the model's
diagnostic accuracy.
( 2
min )
Semantic segmentation enables robots to perceive and reason about their
environments beyond geometry. Most of such systems build upon deep learning
approaches. As autonomous robots are commonly deployed in initially unknown
environments, pre-training on static datasets cannot always capture the variety
of domains and limits the robot's perception performance during missions.
Recently, self-supervised and fully supervised active learning methods emerged
to improve a robot's vision. These approaches rely on large in-domain
pre-training datasets or require substantial human labelling effort. We propose
a planning method for semi-supervised active learning of semantic segmentation
that substantially reduces human labelling requirements compared to fully
supervised approaches. We leverage an adaptive map-based planner guided towards
the frontiers of unexplored space with high model uncertainty collecting
training data for human labelling. A key aspect of our approach is to combine
the sparse high-quality human labels with pseudo labels automatically extracted
from highly certain environment map areas. Experimental results show that our
method reaches segmentation performance close to fully supervised approaches
with drastically reduced human labelling effort while outperforming
self-supervised approaches.
( 2
min )
The majority of the research on the quantization of Deep Neural Networks
(DNNs) is focused on reducing the precision of tensors visible by high-level
frameworks (e.g., weights, activations, and gradients). However, current
hardware still relies on high-accuracy core operations. Most significant is the
operation of accumulating products. This high-precision accumulation operation
is gradually becoming the main computational bottleneck. This is because, so
far, the usage of low-precision accumulators led to a significant degradation
in performance. In this work, we present a simple method to train and fine-tune
high-end DNNs, to allow, for the first time, utilization of cheaper, $12$-bits
accumulators, with no significant degradation in accuracy. Lastly, we show that
as we decrease the accumulation precision further, using fine-grained gradient
approximations can improve the DNN accuracy.
( 2
min )
K-fold cross-validation is a widely used tool for assessing classifier
performance. The reproducibility crisis faced by artificial intelligence partly
results from the irreproducibility of reported k-fold cross-validation-based
performance scores. Recently, we introduced numerical techniques to test the
consistency of claimed performance scores and experimental setups. In a crucial
use case, the method relies on the combinatorial enumeration of all k-fold
configurations, for which we proposed an algorithm in the binary classification
case.
( 2
min )
We consider the problem of learning linear operators under squared loss
between two infinite-dimensional Hilbert spaces in the online setting. We show
that the class of linear operators with uniformly bounded $p$-Schatten norm is
online learnable for any $p \in [1, \infty)$. On the other hand, we prove an
impossibility result by showing that the class of uniformly bounded linear
operators with respect to the operator norm is \textit{not} online learnable.
Moreover, we show a separation between sequential uniform convergence and
online learnability by identifying a class of bounded linear operators that is
online learnable but uniform convergence does not hold. Finally, we prove that
the impossibility result and the separation between uniform convergence and
learnability also hold in the batch setting.
( 2
min )
This paper considers stochastic weakly convex optimization without the
standard Lipschitz continuity assumption. Based on new adaptive regularization
(stepsize) strategies, we show that a wide class of stochastic algorithms,
including the stochastic subgradient method, preserve the $\mathcal{O} ( 1 /
\sqrt{K})$ convergence rate with constant failure rate. Our analyses rest on
rather weak assumptions: the Lipschitz parameter can be either bounded by a
general growth function of $\|x\|$ or locally estimated through independent
random samples.
( 2
min )
We propose two graph neural network layers for graphs with features in a
Riemannian manifold. First, based on a manifold-valued graph diffusion
equation, we construct a diffusion layer that can be applied to an arbitrary
number of nodes and graph connectivity patterns. Second, we model a tangent
multilayer perceptron by transferring ideas from the vector neuron framework to
our general setting. Both layers are equivariant with respect to node
permutations and isometries of the feature manifold. These properties have been
shown to lead to a beneficial inductive bias in many deep learning tasks.
Numerical examples on synthetic data as well as on triangle meshes of the right
hippocampus to classify Alzheimer's disease demonstrate the very good
performance of our layers.
( 2
min )
Implicit neural representations (INRs) are a rapidly growing research field,
which provides alternative ways to represent multimedia signals. Recent
applications of INRs include image super-resolution, compression of
high-dimensional signals, or 3D rendering. However, these solutions usually
focus on visual data, and adapting them to the audio domain is not trivial.
Moreover, it requires a separately trained model for every data sample. To
address this limitation, we propose HyperSound, a meta-learning method
leveraging hypernetworks to produce INRs for audio signals unseen at training
time. We show that our approach can reconstruct sound waves with quality
comparable to other state-of-the-art models.
( 2
min )
The recent advances in natural language processing have predominantly favored
well-resourced English-centric models, resulting in a significant gap with
low-resource languages. In this work, we introduce the language model TURNA,
which is developed for the low-resource language Turkish and is capable of both
natural language understanding and generation tasks. TURNA is pretrained with
an encoder-decoder architecture based on the unified framework UL2 with a
diverse corpus that we specifically curated for this purpose. We evaluated
TURNA with three generation tasks and five understanding tasks for Turkish. The
results show that TURNA outperforms several multilingual models in both
understanding and generation tasks, and competes with monolingual Turkish
models in understanding tasks. TURNA is made available at
https://huggingface.co/boun-tabi-LMG/TURNA .
( 2
min )
Machine learning typically presupposes classical probability theory which
implies that aggregation is built upon expectation. There are now multiple
reasons to motivate looking at richer alternatives to classical probability
theory as a mathematical foundation for machine learning. We systematically
examine a powerful and rich class of alternative aggregation functionals, known
variously as spectral risk measures, Choquet integrals or Lorentz norms. We
present a range of characterization results, and demonstrate what makes this
spectral family so special. In doing so we arrive at a natural stratification
of all coherent risk measures in terms of the upper probabilities that they
induce by exploiting results from the theory of rearrangement invariant Banach
spaces. We empirically demonstrate how this new approach to uncertainty helps
tackling practical machine learning problems.
( 2
min )
With the rapid advancement in cyber-physical systems, the increasing number
of sensors has significantly complicated manual monitoring of system states.
Consequently, graph-based time-series anomaly detection methods have gained
attention due to their ability to explicitly represent relationships between
sensors. However, these methods often apply a uniform source node
representation across all connected target nodes, even when updating different
target node representations. Moreover, the graph attention mechanism, commonly
used to infer unknown graph structures, could constrain the diversity of source
node representations. In this paper, we introduce the Edge Conditional
Node-update Graph Neural Network (ECNU-GNN). Our model, equipped with an edge
conditional node update module, dynamically transforms source node
representations based on connected edges to represent target nodes aptly. We
validate performance on three real-world datasets: SWaT, WADI, and PSM. Our
model demonstrates 5.4%, 12.4%, and 6.0% higher performance, respectively,
compared to best F1 baseline models.
( 2
min )
In this study, we explore the synergy of deep learning and financial market
applications, focusing on pair trading. This market-neutral strategy is
integral to quantitative finance and is apt for advanced deep-learning
techniques. A pivotal challenge in pair trading is discerning temporal
correlations among entities, necessitating the integration of diverse data
modalities. Addressing this, we introduce a novel framework, Multi-modal
Temporal Relation Graph Learning (MTRGL). MTRGL combines time series data and
discrete features into a temporal graph and employs a memory-based temporal
graph neural network. This approach reframes temporal correlation
identification as a temporal graph link prediction task, which has shown
empirical success. Our experiments on real-world datasets confirm the superior
performance of MTRGL, emphasizing its promise in refining automated pair
trading strategies.
( 2
min )
Federated Learning (FL) is a promising technique for the collaborative
training of deep neural networks across multiple devices while preserving data
privacy. Despite its potential benefits, FL is hindered by excessive
communication costs due to repeated server-client communication during
training. To address this challenge, model compression techniques, such as
sparsification and weight clustering are applied, which often require modifying
the underlying model aggregation schemes or involve cumbersome hyperparameter
tuning, with the latter not only adjusts the model's compression rate but also
limits model's potential for continuous improvement over growing data. In this
paper, we propose FedCompress, a novel approach that combines dynamic weight
clustering and server-side knowledge distillation to reduce communication costs
while learning highly generalizable models. Through a comprehensive evaluation
on diverse public datasets, we demonstrate the efficacy of our approach
compared to baselines in terms of communication costs and inference speed. We
will make our implementation public upon acceptance.
( 2
min )
We propose EEG-SimpleConv, a straightforward 1D convolutional neural network
for Motor Imagery decoding in BCI. Our main motivation is to propose a simple
and performing baseline to compare to, using only very standard ingredients
from the literature. We evaluate its performance on four EEG Motor Imagery
datasets, including simulated online setups, and compare it to recent Deep
Learning and Machine Learning approaches. EEG-SimpleConv is at least as good or
far more efficient than other approaches, showing strong knowledge-transfer
capabilities across subjects, at the cost of a low inference time. We advocate
that using off-the-shelf ingredients rather than coming with ad-hoc solutions
can significantly help the adoption of Deep Learning approaches for BCI. We
make the code of the models and the experiments accessible.
( 2
min )
This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE)
serving system that realizes activation-aware expert offloading. MoE-Infinity
features sequence-level expert activation tracing, a new approach adept at
identifying sparse activations and capturing the temporal locality of MoE
inference. By analyzing these traces, MoE-Infinity performs novel
activation-aware expert prefetching and caching, substantially reducing the
latency overheads usually associated with offloading experts for improved cost
performance. Extensive experiments in a cluster show that MoE-Infinity
outperforms numerous existing systems and approaches, reducing latency by 4 -
20X and decreasing deployment costs by over 8X for various MoEs. MoE-Infinity's
source code is publicly available at https://github.com/TorchMoE/MoE-Infinity
( 2
min )
Thin-layer chromatography (TLC) is a crucial technique in molecular polarity
analysis. Despite its importance, the interpretability of predictive models for
TLC, especially those driven by artificial intelligence, remains a challenge.
Current approaches, utilizing either high-dimensional molecular fingerprints or
domain-knowledge-driven feature engineering, often face a dilemma between
expressiveness and interpretability. To bridge this gap, we introduce
Unsupervised Hierarchical Symbolic Regression (UHiSR), combining hierarchical
neural networks and symbolic regression. UHiSR automatically distills
chemical-intuitive polarity indices, and discovers interpretable equations that
link molecular structure to chromatographic behavior.
( 2
min )
This paper examines the use of deep recurrent neural networks to classify
traffic patterns in smart cities. We propose a novel approach to traffic
pattern classification based on deep recurrent neural networks, which can
effectively capture traffic patterns' dynamic and sequential features. The
proposed model combines convolutional and recurrent layers to extract features
from traffic pattern data and a SoftMax layer to classify traffic patterns.
Experimental results show that the proposed model outperforms existing methods
regarding accuracy, precision, recall, and F1 score. Furthermore, we provide an
in depth analysis of the results and discuss the implications of the proposed
model for smart cities. The results show that the proposed model can accurately
classify traffic patterns in smart cities with a precision of as high as 95%.
The proposed model is evaluated on a real world traffic pattern dataset and
compared with existing classification methods.
( 2
min )
In this paper, we describe the TTS models developed by NVIDIA for the
MMITS-VC (Multi-speaker, Multi-lingual Indic TTS with Voice Cloning) 2024
Challenge. In Tracks 1 and 2, we utilize RAD-MMM to perform few-shot TTS by
training additionally on 5 minutes of target speaker data. In Track 3, we
utilize P-Flow to perform zero-shot TTS by training on the challenge dataset as
well as external datasets. We use HiFi-GAN vocoders for all submissions.
RAD-MMM performs competitively on Tracks 1 and 2, while P-Flow ranks first on
Track 3, with mean opinion score (MOS) 4.4 and speaker similarity score (SMOS)
of 3.62.
( 2
min )
The application of process mining for unstructured data might significantly
elevate novel insights into disciplines where unstructured data is a common
data format. To efficiently analyze unstructured data by process mining and to
convey confidence into the analysis result, requires bridging multiple
challenges. The purpose of this paper is to discuss these challenges, present
initial solutions and describe future research directions. We hope that this
article lays the foundations for future collaboration on this topic.
( 2
min )
We establish a layer-wise parameterization for 1D convolutional neural
networks (CNNs) with built-in end-to-end robustness guarantees. In doing so, we
use the Lipschitz constant of the input-output mapping characterized by a CNN
as a robustness measure. We base our parameterization on the Cayley transform
that parameterizes orthogonal matrices and the controllability Gramian of the
state space representation of the convolutional layers. The proposed
parameterization by design fulfills linear matrix inequalities that are
sufficient for Lipschitz continuity of the CNN, which further enables
unconstrained training of Lipschitz-bounded 1D CNNs. Finally, we train
Lipschitz-bounded 1D CNNs for the classification of heart arrythmia data and
show their improved robustness.
( 2
min )
Correlation clustering is a well-known unsupervised learning setting that
deals with positive and negative pairwise similarities. In this paper, we study
the case where the pairwise similarities are not given in advance and must be
queried in a cost-efficient way. Thereby, we develop a generic active learning
framework for this task that benefits from several advantages, e.g.,
flexibility in the type of feedback that a user/annotator can provide,
adaptation to any correlation clustering algorithm and query strategy, and
robustness to noise. In addition, we propose and analyze a number of novel
query strategies suited to this setting. We demonstrate the effectiveness of
our framework and the proposed query strategies via several experimental
studies.
( 2
min )
In this study, we examine the representation learning abilities of Denoising
Diffusion Models (DDM) that were originally purposed for image generation. Our
philosophy is to deconstruct a DDM, gradually transforming it into a classical
Denoising Autoencoder (DAE). This deconstructive procedure allows us to explore
how various components of modern DDMs influence self-supervised representation
learning. We observe that only a very few modern components are critical for
learning good representations, while many others are nonessential. Our study
ultimately arrives at an approach that is highly simplified and to a large
extent resembles a classical DAE. We hope our study will rekindle interest in a
family of classical methods within the realm of modern self-supervised
learning.
( 2
min )
The rapid development of large language models has revolutionized code
intelligence in software development. However, the predominance of
closed-source models has restricted extensive research and development. To
address this, we introduce the DeepSeek-Coder series, a range of open-source
code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion
tokens. These models are pre-trained on a high-quality project-level code
corpus and employ a fill-in-the-blank task with a 16K window to enhance code
generation and infilling. Our extensive evaluations demonstrate that
DeepSeek-Coder not only achieves state-of-the-art performance among open-source
code models across multiple benchmarks but also surpasses existing
closed-source models like Codex and GPT-3.5. Furthermore, DeepSeek-Coder models
are under a permissive license that allows for both research and unrestricted
commercial use.
( 2
min )
As the number of accepted papers at AI and ML conferences reaches into the
thousands, it has become unclear how researchers access and read research
publications. In this paper, we investigate the role of social media
influencers in enhancing the visibility of machine learning research,
particularly the citation counts of papers they share. We have compiled a
comprehensive dataset of over 8,000 papers, spanning tweets from December 2018
to October 2023, alongside 1:1 matched controls based on publication year,
venue, and abstract topics. Our analysis reveals a significant increase in
citations for papers endorsed by these influencers, with median citation counts
2-3 times higher than those of the control group. Additionally, the study
delves into the geographic, gender, and institutional diversity of highlighted
authors. These findings highlight the expanding influence of social media in
scholarly communication and underscore the importance of an evolving ecosystem
in today's digital academic landscape.
( 2
min )
We develop a novel multiple hypothesis testing correction with family-wise
error rate (FWER) control that efficiently exploits positive dependencies
between potentially correlated statistical hypothesis tests. Our proposed
algorithm $\texttt{max-rank}$ is conceptually straight-forward, relying on the
use of a $\max$-operator in the rank domain of computed test statistics. We
compare our approach to the frequently employed Bonferroni correction,
theoretically and empirically demonstrating its superiority over Bonferroni in
the case of existing positive dependency, and its equivalence otherwise. Our
advantage over Bonferroni increases as the number of tests rises, and we
maintain high statistical power whilst ensuring FWER control. We specifically
frame our algorithm in the context of parallel permutation testing, a scenario
that arises in our primary application of conformal prediction, a recently
popularized approach for quantifying uncertainty in complex predictive
settings.
( 2
min )
We establish a layer-wise parameterization for 1D convolutional neural
networks (CNNs) with built-in end-to-end robustness guarantees. In doing so, we
use the Lipschitz constant of the input-output mapping characterized by a CNN
as a robustness measure. We base our parameterization on the Cayley transform
that parameterizes orthogonal matrices and the controllability Gramian of the
state space representation of the convolutional layers. The proposed
parameterization by design fulfills linear matrix inequalities that are
sufficient for Lipschitz continuity of the CNN, which further enables
unconstrained training of Lipschitz-bounded 1D CNNs. Finally, we train
Lipschitz-bounded 1D CNNs for the classification of heart arrythmia data and
show their improved robustness.
( 2
min )
In this work we undertake a thorough study of the non-asymptotic properties
of the vanilla generative adversarial networks (GANs). We prove an oracle
inequality for the Jensen-Shannon (JS) divergence between the underlying
density $\mathsf{p}^*$ and the GAN estimate with a significantly better
statistical error term compared to the previously known results. The advantage
of our bound becomes clear in application to nonparametric density estimation.
We show that the JS-divergence between the GAN estimate and $\mathsf{p}^*$
decays as fast as $(\log{n}/n)^{2\beta/(2\beta + d)}$, where $n$ is the sample
size and $\beta$ determines the smoothness of $\mathsf{p}^*$. This rate of
convergence coincides (up to logarithmic factors) with minimax optimal for the
considered class of densities.
( 2
min )
We consider the problem of learning linear operators under squared loss
between two infinite-dimensional Hilbert spaces in the online setting. We show
that the class of linear operators with uniformly bounded $p$-Schatten norm is
online learnable for any $p \in [1, \infty)$. On the other hand, we prove an
impossibility result by showing that the class of uniformly bounded linear
operators with respect to the operator norm is \textit{not} online learnable.
Moreover, we show a separation between sequential uniform convergence and
online learnability by identifying a class of bounded linear operators that is
online learnable but uniform convergence does not hold. Finally, we prove that
the impossibility result and the separation between uniform convergence and
learnability also hold in the batch setting.
( 2
min )
Correlation clustering is a well-known unsupervised learning setting that
deals with positive and negative pairwise similarities. In this paper, we study
the case where the pairwise similarities are not given in advance and must be
queried in a cost-efficient way. Thereby, we develop a generic active learning
framework for this task that benefits from several advantages, e.g.,
flexibility in the type of feedback that a user/annotator can provide,
adaptation to any correlation clustering algorithm and query strategy, and
robustness to noise. In addition, we propose and analyze a number of novel
query strategies suited to this setting. We demonstrate the effectiveness of
our framework and the proposed query strategies via several experimental
studies.
( 2
min )
We propose a novel algorithm for the support estimation of partially known
Gaussian graphical models that incorporates prior information about the
underlying graph. In contrast to classical approaches that provide a point
estimate based on a maximum likelihood or a maximum a posteriori criterion
using (simple) priors on the precision matrix, we consider a prior on the graph
and rely on annealed Langevin diffusion to generate samples from the posterior
distribution. Since the Langevin sampler requires access to the score function
of the underlying graph prior, we use graph neural networks to effectively
estimate the score from a graph dataset (either available beforehand or
generated from a known distribution). Numerical experiments demonstrate the
benefits of our approach.
( 2
min )
This post provides three guided steps to architect risk management strategies while developing generative AI applications using LLMs. We first delve into the vulnerabilities, threats, and risks that arise from the implementation, deployment, and use of LLM solutions, and provide guidance on how to start innovating with security in mind. We then discuss how building on a secure foundation is essential for generative AI. Lastly, we connect these together with an example LLM workload to describe an approach towards architecting with defense-in-depth security across trust boundaries.
( 22
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )